- GWAS Trait-to-Gene Discovery
- Discover genes associated with diseases and traits using genome-wide association studies (GWAS)
- Overview
- This skill enables systematic discovery of genes linked to diseases/traits by analyzing GWAS data from two major resources:
- GWAS Catalog
- (EBI/NHGRI): Curated catalog of published GWAS with >500,000 associations
- Open Targets Genetics
-
- Fine-mapped GWAS signals with locus-to-gene (L2G) predictions
- Use Cases
- Clinical Research
- "What genes are associated with type 2 diabetes?"
- "Find genetic risk factors for coronary artery disease"
- "Which genes contribute to Alzheimer's disease susceptibility?"
- Drug Target Discovery
- Identify genes with strong genetic evidence for disease causation
- Prioritize targets based on L2G scores and replication across studies
- Find genes with genome-wide significant associations (p < 5e-8)
- Functional Genomics
- Map disease-associated variants to candidate genes
- Analyze genetic architecture of complex traits
- Understand polygenic disease mechanisms
- Workflow
- 1. Trait Search → Search GWAS Catalog by disease/trait name
- ↓
- 2. SNP Aggregation → Collect genome-wide significant SNPs (p < 5e-8)
- ↓
- 3. Gene Mapping → Extract mapped genes from associations
- ↓
- 4. Evidence Ranking → Score by p-value, replication, fine-mapping
- ↓
- 5. Annotation (Optional) → Add L2G predictions from Open Targets
- Key Concepts
- Genome-wide Significance
- Standard threshold: p < 5×10⁻⁸
- Accounts for multiple testing burden across ~1M common variants
- Higher confidence: p < 5×10⁻¹⁰ or replicated across studies
- Gene Mapping Methods
- Positional
-
- Nearest gene to lead SNP
- Fine-mapping
-
- Statistical refinement to credible variants
- Locus-to-Gene (L2G)
-
- Integrative score combining multiple evidence types
- Evidence Confidence Levels
- High
-
- L2G score > 0.5 OR multiple studies with p < 5e-10
- Medium
-
- 2+ studies with p < 5e-8
- Low
- Single study or marginal significance Required ToolUniverse Tools GWAS Catalog (11 tools) gwas_get_associations_for_trait - Get all associations for a trait (sorted by p-value) gwas_search_snps - Search SNPs by gene mapping gwas_get_snp_by_id - Get SNP details (MAF, consequence, location) gwas_get_study_by_id - Get study metadata gwas_search_associations - Search associations with filters gwas_search_studies - Search studies by trait/cohort gwas_get_associations_for_snp - Get all associations for a SNP gwas_get_variants_for_trait - Get variants for a trait gwas_get_studies_for_trait - Get studies for a trait gwas_get_snps_for_gene - Get SNPs mapped to a gene gwas_get_associations_for_study - Get associations from a study Open Targets Genetics (6 tools) OpenTargets_search_gwas_studies_by_disease - Search studies by disease ontology OpenTargets_get_study_credible_sets - Get fine-mapped loci for a study OpenTargets_get_variant_credible_sets - Get credible sets for a variant OpenTargets_get_variant_info - Get variant annotation (frequencies, consequences) OpenTargets_get_gwas_study - Get study metadata OpenTargets_get_credible_set_detail - Get detailed credible set information Parameters Required trait - Disease/trait name (e.g., "type 2 diabetes", "coronary artery disease") Optional p_value_threshold - Significance threshold (default: 5e-8) min_evidence_count - Minimum number of studies (default: 1) max_results - Maximum genes to return (default: 100) use_fine_mapping - Include L2G predictions (default: true) disease_ontology_id - Disease ontology ID for Open Targets (e.g., "MONDO_0005148") Output Schema { "genes" : [ { "symbol" : str ,
Gene symbol (e.g., "TCF7L2")
"min_p_value" : float ,
Most significant p-value
"evidence_count" : int ,
Number of independent studies
"snps" : [ str ] ,
Associated SNP rs IDs
"studies" : [ str ] ,
GWAS study accessions
"l2g_score" : float | null ,
Locus-to-gene score (0-1)
"credible_sets" : int ,
Number of credible sets
"confidence_level" : str
"High", "Medium", or "Low"
} ] , "summary" : { "trait" : str , "total_associations" : int , "significant_genes" : int , "data_sources" : [ "GWAS Catalog" , "Open Targets" ] } } Example Results Type 2 Diabetes TCF7L2: p=1.2e-98, 15 studies, L2G=0.82 → High confidence KCNJ11: p=3.4e-67, 12 studies, L2G=0.76 → High confidence PPARG: p=2.1e-45, 8 studies, L2G=0.71 → High confidence FTO: p=5.6e-42, 10 studies, L2G=0.68 → High confidence IRS1: p=8.9e-38, 6 studies, L2G=0.54 → High confidence Alzheimer's Disease APOE: p=1.0e-450, 25 studies, L2G=0.95 → High confidence BIN1: p=2.3e-89, 18 studies, L2G=0.88 → High confidence CLU: p=4.5e-67, 16 studies, L2G=0.82 → High confidence ABCA7: p=6.7e-54, 14 studies, L2G=0.79 → High confidence CR1: p=8.9e-52, 13 studies, L2G=0.75 → High confidence Best Practices 1. Use Disease Ontology IDs for Precision
Instead of:
discover_gwas_genes("diabetes") # Ambiguous
Use:
discover_gwas_genes( "type 2 diabetes", disease_ontology_id="MONDO_0005148" # Specific ) 2. Filter by Evidence Strength
For drug targets, require strong evidence:
- discover_gwas_genes(
- "coronary artery disease",
- p_value_threshold=5e-10, # Stricter than GWAS threshold
- min_evidence_count=3, # Multiple independent studies
- use_fine_mapping=True # Include L2G predictions
- )
- 3. Interpret Results Carefully
- Association ≠ Causation
-
- GWAS identifies correlated variants, not necessarily causal genes
- Linkage Disequilibrium
-
- Lead SNP may tag the true causal variant in a nearby gene
- Fine-mapping
-
- L2G scores provide better causal gene evidence than positional mapping
- Functional Evidence
- Validate with orthogonal data (eQTLs, knockout models, etc.) Limitations Gene Mapping Uncertainty Positional mapping assigns SNPs to nearest gene (may be incorrect) Fine-mapping available for only a subset of studies Intergenic variants difficult to map Population Bias Most GWAS in European populations Effect sizes may differ across ancestries Rare variants often under-represented Sample Size Dependence Larger studies detect more associations Older small studies may have false negatives p-values alone don't indicate effect size Validation Bug Some ToolUniverse tools have oneOf validation issues Use validate=False parameter if needed This is automatically handled in the Python implementation