- GWAS Fine-Mapping & Causal Variant Prioritization
- Identify and prioritize causal variants at GWAS loci using statistical fine-mapping and locus-to-gene predictions.
- Overview
- Genome-wide association studies (GWAS) identify genomic regions associated with traits, but linkage disequilibrium (LD) makes it difficult to pinpoint the causal variant.
- Fine-mapping
- uses Bayesian statistical methods to compute the posterior probability that each variant is causal, given the GWAS summary statistics.
- This skill provides tools to:
- Prioritize causal variants
- using fine-mapping posterior probabilities
- Link variants to genes
- using locus-to-gene (L2G) predictions
- Annotate variants
- with functional consequences
- Suggest validation strategies
- based on fine-mapping results
- Key Concepts
- Credible Sets
- A
- credible set
- is a minimal set of variants that contains the causal variant with high confidence (typically 95% or 99%). Each variant in the set has a
- posterior probability
- of being causal, computed using methods like:
- SuSiE
- (Sum of Single Effects)
- FINEMAP
- (Bayesian fine-mapping)
- PAINTOR
- (Probabilistic Annotation INtegraTOR)
- Posterior Probability
- The probability that a specific variant is causal, given the GWAS data and LD structure. Higher posterior probability = more likely to be causal.
- Locus-to-Gene (L2G) Predictions
- L2G scores integrate multiple data types to predict which gene is affected by a variant:
- Distance to gene (closer = higher score)
- eQTL evidence (expression changes)
- Chromatin interactions (Hi-C, promoter capture)
- Functional annotations (coding variants, regulatory regions)
- L2G scores range from 0 to 1, with higher scores indicating stronger gene-variant links.
- Use Cases
- 1. Prioritize Variants at a Known Locus
- Question
- "Which variant at the TCF7L2 locus is likely causal for type 2 diabetes?" from python_implementation import prioritize_causal_variants
Prioritize variants in TCF7L2 for diabetes
result
prioritize_causal_variants ( "TCF7L2" , "type 2 diabetes" ) print ( result . get_summary ( ) )
Output shows:
- Credible sets containing TCF7L2 variants
- Posterior probabilities (via fine-mapping methods)
- Top L2G genes (which genes are likely affected)
- Associated traits
-
- Fine-Map a Specific Variant
- Question
- "What do we know about rs429358 (APOE4) from fine-mapping?"
Fine-map a specific variant
result
prioritize_causal_variants ( "rs429358" )
Check which credible sets contain this variant
- for
- cs
- in
- result
- .
- credible_sets
- :
- (
- f"Trait:
- {
- cs
- .
- trait
- }
- "
- )
- (
- f"Fine-mapping method:
- {
- cs
- .
- finemapping_method
- }
- "
- )
- (
- f"Top gene:
- {
- cs
- .
- l2g_genes
- [
- 0
- ]
- if
- cs
- .
- l2g_genes
- else
- 'N/A'
- }
- "
- )
- (
- f"Confidence:
- {
- cs
- .
- confidence
- }
- "
- )
- 3. Explore All Loci from a GWAS Study
- Question
- "What are all the causal loci from the recent T2D meta-analysis?" from python_implementation import get_credible_sets_for_study
Get all fine-mapped loci from a study
credible_sets
get_credible_sets_for_study ( "GCST90029024" )
T2D GWAS
print ( f"Found { len ( credible_sets ) } independent loci" )
Examine each locus
- for
- cs
- in
- credible_sets
- :
- (
- f"\nRegion:
- {
- cs
- .
- region
- }
- "
- )
- (
- f"Lead variant:
- {
- cs
- .
- lead_variant
- .
- rs_ids
- [
- 0
- ]
- if
- cs
- .
- lead_variant
- else
- 'N/A'
- }
- "
- )
- if
- cs
- .
- l2g_genes
- :
- top_gene
- =
- cs
- .
- l2g_genes
- [
- 0
- ]
- (
- f"Most likely causal gene:
- {
- top_gene
- .
- gene_symbol
- }
- (L2G:
- {
- top_gene
- .
- l2g_score
- :
- .3f
- }
- )"
- )
- 4. Find GWAS Studies for a Disease
- Question
- "What GWAS studies exist for Alzheimer's disease?" from python_implementation import search_gwas_studies_for_disease
Search by disease name
studies
search_gwas_studies_for_disease ( "Alzheimer's disease" ) for study in studies [ : 5 ] : print ( f" { study [ 'id' ] } : { study . get ( 'nSamples' , 'N/A' ) } samples" ) print ( f" Author: { study . get ( 'publicationFirstAuthor' , 'N/A' ) } " ) print ( f" Has summary stats: { study . get ( 'hasSumstats' , False ) } " )
Or use precise disease ontology IDs
studies
search_gwas_studies_for_disease ( "Alzheimer's disease" , disease_id = "EFO_0000249"
EFO ID for Alzheimer's
- )
- 5. Get Validation Suggestions
- Question
- "How should we validate the top causal variant?" result = prioritize_causal_variants ( "APOE" , "alzheimer" )
Get experimental validation suggestions
suggestions
result . get_validation_suggestions ( ) for suggestion in suggestions : print ( suggestion )
Output includes:
- CRISPR knock-in experiments
- Reporter assays
- eQTL analysis
- Colocalization studies
Workflow Example: Complete Fine-Mapping Analysis from python_implementation import ( prioritize_causal_variants , search_gwas_studies_for_disease , get_credible_sets_for_study )
Step 1: Find relevant GWAS studies
print ( "Step 1: Finding T2D GWAS studies..." ) studies = search_gwas_studies_for_disease ( "type 2 diabetes" , "MONDO_0005148" ) largest_study = max ( studies , key = lambda s : s . get ( 'nSamples' , 0 ) or 0 ) print ( f"Largest study: { largest_study [ 'id' ] } ( { largest_study . get ( 'nSamples' , 'N/A' ) } samples)" )
Step 2: Get all fine-mapped loci from the study
print ( "\nStep 2: Getting fine-mapped loci..." ) credible_sets = get_credible_sets_for_study ( largest_study [ 'id' ] , max_sets = 100 ) print ( f"Found { len ( credible_sets ) } credible sets" )
Step 3: Find loci near genes of interest
print ( "\nStep 3: Finding TCF7L2 loci..." ) tcf7l2_loci = [ cs for cs in credible_sets if any ( gene . gene_symbol == "TCF7L2" for gene in cs . l2g_genes ) ] print ( f"TCF7L2 appears in { len ( tcf7l2_loci ) } loci" )
Step 4: Prioritize variants at TCF7L2
print ( "\nStep 4: Prioritizing TCF7L2 variants..." ) result = prioritize_causal_variants ( "TCF7L2" , "type 2 diabetes" )
Step 5: Print summary and validation plan
- (
- "\n"
- +
- "="
- *
- 60
- )
- (
- "FINE-MAPPING SUMMARY"
- )
- (
- "="
- *
- 60
- )
- (
- result
- .
- get_summary
- (
- )
- )
- (
- "\n"
- +
- "="
- *
- 60
- )
- (
- "VALIDATION STRATEGY"
- )
- (
- "="
- *
- 60
- )
- suggestions
- =
- result
- .
- get_validation_suggestions
- (
- )
- for
- suggestion
- in
- suggestions
- :
- (
- suggestion
- )
- Data Classes
- FineMappingResult
- Main result object containing:
- query_variant
-
- Variant annotation
- query_gene
-
- Gene symbol (if queried by gene)
- credible_sets
-
- List of fine-mapped loci
- associated_traits
-
- All associated traits
- top_causal_genes
-
- L2G genes ranked by score
- Methods:
- get_summary()
-
- Human-readable summary
- get_validation_suggestions()
-
- Experimental validation strategies
- CredibleSet
- Represents a fine-mapped locus:
- study_locus_id
-
- Unique identifier
- region
-
- Genomic region (e.g., "10:112861809-113404438")
- lead_variant
-
- Top variant by posterior probability
- finemapping_method
-
- Statistical method used (SuSiE, FINEMAP, etc.)
- l2g_genes
-
- Locus-to-gene predictions
- confidence
-
- Credible set confidence (95%, 99%)
- L2GGene
- Locus-to-gene prediction:
- gene_symbol
-
- Gene name (e.g., "TCF7L2")
- gene_id
-
- Ensembl gene ID
- l2g_score
-
- Probability score (0-1)
- VariantAnnotation
- Functional annotation for a variant:
- variant_id
-
- Open Targets format (chr_pos_ref_alt)
- rs_ids
-
- dbSNP identifiers
- chromosome
- ,
- position
-
- Genomic coordinates
- most_severe_consequence
-
- Functional impact
- allele_frequencies
-
- Population-specific MAFs
- Tools Used
- Open Targets Genetics (GraphQL)
- OpenTargets_get_variant_info
-
- Variant details and allele frequencies
- OpenTargets_get_variant_credible_sets
-
- Credible sets containing a variant
- OpenTargets_get_credible_set_detail
-
- Detailed credible set information
- OpenTargets_get_study_credible_sets
-
- All loci from a GWAS study
- OpenTargets_search_gwas_studies_by_disease
-
- Find studies by disease
- GWAS Catalog (REST API)
- gwas_search_snps
-
- Find SNPs by gene or rsID
- gwas_get_snp_by_id
-
- Detailed SNP information
- gwas_get_associations_for_snp
-
- All trait associations for a variant
- gwas_search_studies
-
- Find studies by disease/trait
- Understanding Fine-Mapping Output
- Interpreting Posterior Probabilities
- > 0.5
-
- Very likely causal (strong candidate)
- 0.1 - 0.5
-
- Plausible causal variant
- 0.01 - 0.1
-
- Possible but uncertain
- < 0.01
-
- Unlikely to be causal
- Interpreting L2G Scores
- > 0.7
-
- High confidence gene-variant link
- 0.5 - 0.7
-
- Moderate confidence
- 0.3 - 0.5
-
- Weak but possible link
- < 0.3
-
- Low confidence
- Fine-Mapping Methods Compared
- Method
- Approach
- Strengths
- Use Case
- SuSiE
- Sum of Single Effects
- Handles multiple causal variants
- Multi-signal loci
- FINEMAP
- Bayesian shotgun stochastic search
- Fast, scalable
- Large studies
- PAINTOR
- Functional annotations
- Integrates epigenomics
- Regulatory variants
- CAVIAR
- Colocalization
- Finds shared causal variants
- eQTL overlap
- Common Questions
- Q: Why don't all variants have credible sets?
- A: Fine-mapping requires:
- GWAS summary statistics (not just top hits)
- LD reference panel
- Sufficient signal strength (p < 5e-8)
- Computational resources
- Q: Can a variant be in multiple credible sets?
- A: Yes! A variant can be causal for multiple traits (pleiotropy) or appear in different studies for the same trait.
- Q: What if the top L2G gene is far from the variant?
- A: This suggests regulatory effects (enhancers, promoters). Check:
- eQTL evidence in relevant tissues
- Chromatin interaction data (Hi-C)
- Regulatory element annotations (Roadmap, ENCODE)
- Q: How do I choose between variants in a credible set?
- A: Prioritize by:
- Posterior probability (higher = better)
- Functional consequence (coding > regulatory > intergenic)
- eQTL evidence
- Evolutionary conservation
- Experimental feasibility
- Limitations
- LD-dependent
-
- Fine-mapping accuracy depends on LD structure matching the study population
- Requires summary stats
-
- Not all studies provide full summary statistics
- Computational intensive
-
- Fine-mapping large studies takes significant resources
- Prior assumptions
-
- Bayesian methods depend on priors (number of causal variants, effect sizes)
- Missing data
- Not all GWAS loci have been fine-mapped in Open Targets Best Practices Start with study-level queries when exploring a new disease Check multiple studies for replication of signals Combine with functional data (eQTLs, chromatin, CRISPR screens) Consider ancestry - LD differs across populations Validate experimentally - fine-mapping provides candidates, not proof References Wang et al. (2020) "A simple new approach to variable selection in regression, with application to genetic fine mapping." JRSS-B (SuSiE) Benner et al. (2016) "FINEMAP: efficient variable selection using summary data from genome-wide association studies." Bioinformatics Ghoussaini et al. (2021) "Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics." NAR Mountjoy et al. (2021) "An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci." Nat Genet