tooluniverse-multi-omics-integration

安装量: 115
排名: #7455

安装

npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-multi-omics-integration

Multi-Omics Integration Coordinate and integrate multiple omics datasets for comprehensive systems biology analysis. Orchestrates specialized ToolUniverse skills to perform cross-omics correlation, multi-omics clustering, pathway-level integration, and unified interpretation. When to Use This Skill User has multiple omics datasets (RNA-seq + proteomics, methylation + expression, etc.) Cross-omics correlation queries (e.g., "How does methylation affect expression?") Multi-omics biomarker discovery or patient subtyping Systems biology questions requiring multiple molecular layers Precision medicine applications with multi-omics patient data Workflow Overview Phase 1: Data Loading & QC Load each omics type, format-specific QC, normalize Supported: RNA-seq, proteomics, methylation, CNV/SNV, metabolomics Phase 2: Sample Matching Harmonize sample IDs, find common samples, handle missing omics Phase 3: Feature Mapping Map features to common gene-level identifiers CpG->gene (promoter), CNV->gene, metabolite->enzyme Phase 4: Cross-Omics Correlation RNA vs Protein (translation efficiency) Methylation vs Expression (epigenetic regulation) CNV vs Expression (dosage effect) eQTL variants vs Expression (genetic regulation) Phase 5: Multi-Omics Clustering MOFA+, NMF, SNF for patient subtyping Phase 6: Pathway-Level Integration Aggregate omics evidence at pathway level Score pathway dysregulation with combined evidence Phase 7: Biomarker Discovery Feature selection across omics, multi-omics classification Phase 8: Integrated Report Summary, correlations, clusters, pathways, biomarkers See: phase_details.md for complete code and implementation details. Supported Data Types Omics Formats QC Focus Transcriptomics CSV/TSV, HDF5, h5ad Low-count filter, normalize (TPM/DESeq2), log-transform Proteomics MaxQuant, Spectronaut, DIA-NN Missing value imputation, median/quantile normalization Methylation IDAT, beta matrices Failed probes, batch correction, cross-reactive filter Genomics VCF, SEG (CNV) Variant QC, CNV segmentation Metabolomics Peak tables Missing values, normalization Core Operations Sample Matching def match_samples_across_omics ( omics_data_dict ) : """Match samples across multiple omics datasets.""" sample_ids = { k : set ( df . columns ) for k , df in omics_data_dict . items ( ) } common_samples = set . intersection ( * sample_ids . values ( ) ) matched_data = { k : df [ sorted ( common_samples ) ] for k , df in omics_data_dict . items ( ) } return sorted ( common_samples ) , matched_data Cross-Omics Correlation from scipy . stats import spearmanr , pearsonr

RNA vs Protein: expect positive r ~ 0.4-0.6

Methylation vs Expression: expect negative r (promoter repression)

CNV vs Expression: expect positive r (dosage effect)

for gene in common_genes : r , p = spearmanr ( rna [ gene ] , protein [ gene ] ) Pathway Integration

Score pathway dysregulation using combined evidence from all omics

Aggregate per-gene evidence, then per-pathway

pathway_score

mean
(
abs
(
rna_fc
)
+
abs
(
protein_fc
)
+
abs
(
meth_diff
)
+
abs
(
cnv
)
)
See: phase_details.md for full implementations of each operation.
Multi-Omics Clustering Methods
Method
Description
Best For
MOFA+
Latent factors explaining cross-omics variation
Identifying shared/omics-specific drivers
Joint NMF
Shared decomposition across omics
Patient subtype discovery
SNF
Similarity network fusion
Integrating heterogeneous data types
ToolUniverse Skills Coordination
Skill
Used For
Phase
tooluniverse-rnaseq-deseq2
RNA-seq analysis
1, 4
tooluniverse-epigenomics
Methylation, ChIP-seq
1, 4
tooluniverse-variant-analysis
CNV/SNV processing
1, 3, 4
tooluniverse-protein-interactions
Protein network context
6
tooluniverse-gene-enrichment
Pathway enrichment
6
tooluniverse-expression-data-retrieval
Public data retrieval
1
tooluniverse-target-research
Gene/protein annotation
3, 8
Use Cases
Cancer Multi-Omics
Integrate TCGA RNA-seq + proteomics + methylation + CNV to identify patient subtypes, cross-omics driver genes, and multi-omics biomarkers.
eQTL + Expression + Methylation
Identify SNP -> methylation -> expression regulatory chains (mediation analysis).
Drug Response Multi-Omics
Predict drug response using baseline multi-omics profiles; identify resistance/sensitivity pathways.
See: phase_details.md "Use Cases" for detailed step-by-step workflows.
Quantified Minimums
Component
Requirement
Omics types
At least 2 datasets
Common samples
At least 10 across omics
Cross-correlation
Pearson/Spearman computed
Clustering
At least one method (MOFA+, NMF, or SNF)
Pathway integration
Enrichment with multi-omics evidence scores
Report
Summary, correlations, clusters, pathways, biomarkers
Limitations
Sample size
n >= 20 recommended for integration
Missing data
Pairwise integration if not all samples have all omics
Batch effects
Different platforms require careful normalization
Computational
Large datasets may require significant memory
Interpretation
Results require domain expertise for validation References MOFA+: https://doi.org/10.1186/s13059-020-02015-1 Similarity Network Fusion: https://doi.org/10.1038/nmeth.2810 Multi-omics review: https://doi.org/10.1038/s41576-019-0093-7 See individual ToolUniverse skill documentation for omics-specific methods Detailed Reference phase_details.md - Complete code for all phases, correlation functions, clustering, pathway integration, biomarker discovery, report template, and detailed use cases
返回排行榜