tooluniverse-polygenic-risk-score

安装量: 104
排名: #8059

安装

npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-polygenic-risk-score
Polygenic Risk Score (PRS) Builder
Build and interpret polygenic risk scores for complex diseases using genome-wide association study (GWAS) data.
Overview
Use Cases:
"Calculate my genetic risk for type 2 diabetes"
"Build a polygenic risk score for coronary artery disease"
"What's my genetic predisposition to Alzheimer's disease?"
"Interpret my PRS percentile for breast cancer risk"
What This Skill Does:
Extracts genome-wide significant variants (p < 5e-8) from GWAS Catalog
Builds weighted PRS models using effect sizes (beta coefficients)
Calculates individual risk scores from genotype data
Interprets PRS as population percentiles and risk categories
What This Skill Does NOT Do:
Diagnose disease (PRS is probabilistic, not deterministic)
Replace clinical assessment or genetic counseling
Account for non-genetic factors (lifestyle, environment)
Provide treatment recommendations
Methodology
PRS Calculation Formula
A polygenic risk score is calculated as a weighted sum across genetic variants:
PRS = Σ (dosage_i × effect_size_i)
Where:
dosage_i
Number of effect alleles at SNP i (0, 1, or 2)
effect_size_i
Beta coefficient or log(odds ratio) from GWAS
Standardization
Raw PRS is standardized to z-scores for interpretation:
z-score = (PRS - population_mean) / population_std
This allows comparison to population distribution and percentile calculation.
Significance Thresholds
Genome-wide significance
p < 5×10⁻⁸ (default threshold)
This corrects for ~1 million independent tests across the genome
Relaxed thresholds (e.g., p < 1×10⁻⁵) can include more SNPs but may add noise
Effect Size Handling
Continuous traits
(e.g., height, BMI): Beta coefficient (units of trait per allele)
Binary traits
(e.g., disease): Odds ratio converted to log-odds (beta = ln(OR))
Missing effect sizes or non-significant SNPs are excluded
Data Sources
This skill uses ToolUniverse GWAS tools to query:
GWAS Catalog
(EMBL-EBI)
Curated GWAS associations
5000+ studies, millions of variants
Tools:
gwas_get_associations_for_trait
,
gwas_get_snp_by_id
Open Targets Genetics
Integrated genetics platform
Fine-mapped credible sets
Tools:
OpenTargets_search_gwas_studies_by_disease
,
OpenTargets_get_variant_info
Key Concepts
Polygenic Risk Scores (PRS)
Polygenic risk scores aggregate the effects of many genetic variants to estimate an individual's genetic predisposition to a trait or disease. Unlike Mendelian diseases caused by single mutations, complex diseases involve hundreds to thousands of variants, each with small effects.
Key Properties:
Continuous distribution
PRS forms a bell curve in populations
Relative risk
Compares individual to population average
Probabilistic
High PRS doesn't guarantee disease, low PRS doesn't guarantee protection
Ancestry-specific
PRS accuracy depends on matching GWAS and target ancestry
GWAS (Genome-Wide Association Studies)
GWAS compare allele frequencies between cases and controls (or correlate with trait values) across millions of SNPs to identify disease-associated variants.
Study Design:
Discovery cohort
Initial identification of associations
Replication cohort
Validation in independent samples
Sample size
Larger studies detect smaller effects (power ∝ √N)
Multiple testing correction
Bonferroni-type correction for ~1M tests
Effect Sizes and Odds Ratios
Beta (β)
Change in trait per copy of effect allele
Example: β = 0.5 kg/m² means each allele increases BMI by 0.5 units
Odds Ratio (OR)
Multiplicative change in disease odds
OR = 1.5 means 50% increased odds per allele
Convert to beta: β = ln(OR)
Linkage Disequilibrium (LD) and Clumping
Nearby variants are often inherited together (LD). To avoid double-counting:
LD clumping
Select independent variants (r² < 0.1 within 1 Mb windows)
Fine-mapping
Statistical methods to identify causal variants
This skill uses raw associations; production PRS should include LD pruning
Population Stratification
GWAS and PRS are most accurate when ancestries match:
Population structure
Different ancestries have different allele frequencies
Transferability
European-trained PRS perform worse in non-European populations
Solution
Train PRS on diverse cohorts or use ancestry-matched references
Applications
Clinical Risk Assessment
PRS can stratify individuals for:
Screening programs
Target high-risk individuals (e.g., mammography, colonoscopy)
Prevention strategies
Lifestyle interventions for high genetic risk
Drug response
Pharmacogenomics based on metabolism genes
Example
Khera et al. (2018) showed PRS identifies 3× more individuals at >3-fold coronary artery disease risk than monogenic mutations.
Research Applications
Gene discovery
PRS-based phenome-wide association studies (PheWAS)
Genetic correlation
Compare PRS across traits
Causal inference
Mendelian randomization using PRS as instruments
Simulation studies
Model polygenic architecture
Personal Genomics
Consumer genetic testing (23andMe, Ancestry DNA) provides raw genotypes. Users can:
Calculate PRS for traits not reported
Compare to published PRS models
Understand genetic contribution vs. lifestyle factors
Caution
Personal PRS should not replace medical advice. Results may cause anxiety if not properly contextualized.
Limitations and Considerations
Scientific Limitations
Heritability Gap
PRS explains a fraction of genetic heritability
Type 2 diabetes: ~50% heritable, PRS explains ~10-20%
Rare variants, epistasis, and gene-environment interactions not captured
Ancestry Bias
Most GWAS are European ancestry
PRS accuracy drops in non-European populations
Need for diverse cohort recruitment
Winner's Curse
Discovery effect sizes often overestimated
Replication studies show smaller effects
Meta-analyses provide better estimates
Missing Heritability
Unexplained genetic contribution from:
Rare variants not captured by SNP arrays
Structural variants (CNVs, inversions)
Epigenetic factors
Clinical Limitations
Not Diagnostic
PRS is probabilistic, not deterministic
High PRS doesn't mean you will get disease
Low PRS doesn't mean you won't get disease
Environmental Factors
Many complex diseases are 50%+ environmental
Smoking, diet, exercise, stress, pollution
PRS doesn't account for these
Pleiotropy
Same variants affect multiple traits
Genetic correlation between diseases
Risk for one may protect against another
Actionability
Not all high-risk predictions have interventions
Alzheimer's PRS has limited actionability currently
Ethical considerations for testing
Ethical Considerations
Privacy
Genetic data is identifiable and permanent
Can't be changed like passwords
Familial implications (relatives share genetics)
Discrimination
Potential for genetic discrimination
GINA protects against health/employment discrimination (US)
Life insurance and long-term care not protected
Psychological Impact
Knowledge of high risk can cause anxiety
Need for genetic counseling
Risk communication training
Equity
Ancestry bias means unequal benefits
Europeans benefit most from current PRS
Exacerbates health disparities
References
Key Publications
Lambert et al. (2021)
"The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation"
PGS Catalog:
https://www.pgscatalog.org/
Repository of published PRS models
Khera et al. (2018)
"Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations"
Nature Genetics, 50:1219–1224
Demonstrated clinical utility of PRS
Torkamani et al. (2018)
"The personal and clinical utility of polygenic risk scores"
Nature Reviews Genetics, 19:581–590
Comprehensive review of PRS applications
Martin et al. (2019)
"Clinical use of current polygenic risk scores may exacerbate health disparities"
Nature Genetics, 51:584–591
Addresses ancestry bias and equity concerns
Choi et al. (2020)
"Tutorial: a guide to performing polygenic risk score analyses" Nature Protocols, 15:2759–2772 Practical guide to PRS calculation and evaluation Resources PGS Catalog : https://www.pgscatalog.org/ - Published PRS models LD Hub : http://ldsc.broadinstitute.org/ - Genetic correlations PRSice : https://www.prsice.info/ - PRS calculation software GWAS Catalog : https://www.ebi.ac.uk/gwas/ - Association database Workflow 1. Trait Selection Identify the disease or trait of interest: Use standard terminology (e.g., "type 2 diabetes" not "T2D") Check GWAS Catalog for availability Verify sufficient GWAS studies exist (n > 10,000 samples ideal) 2. Association Collection Query GWAS databases for genome-wide significant associations: prs = build_polygenic_risk_score ( trait = "coronary artery disease" , p_threshold = 5e-8 ,

Genome-wide significance

max_snps

1000
)
Considerations:
P-value threshold: 5e-8 is conservative, 1e-5 includes more variants
LD clumping: Production systems should prune correlated SNPs
Study quality: Prefer large meta-analyses over small studies
3. Effect Size Extraction
Extract beta coefficients or odds ratios:
Beta for continuous traits (direct use)
OR for binary traits (convert to log-odds)
Handle missing values (exclude or impute from meta-analysis)
4. SNP Filtering
Quality control filters:
MAF filter
Exclude rare variants (MAF < 0.01) for robustness
Genotype QC
Remove SNPs with high missingness (> 10%)
Hardy-Weinberg
Exclude SNPs violating HWE (p < 1e-6)
Ambiguous SNPs
Remove A/T and G/C SNPs (strand ambiguity)
5. Score Calculation
Calculate weighted sum of genotype dosages:
result
=
calculate_personal_prs
(
prs_weights
=
prs
,
genotypes
=
my_genotypes
,
population_mean
=
0.0
,
population_std
=
1.0
)
Genotype Sources:
23andMe raw data export
Ancestry DNA raw data
Whole genome sequencing (VCF files)
SNP array data (Illumina, Affymetrix)
6. Risk Interpretation
Convert to percentiles and risk categories:
result
=
interpret_prs_percentile
(
result
)
print
(
f"Percentile:
{
result
.
percentile
:
.1f
}
%"
)
print
(
f"Risk:
{
result
.
risk_category
}
"
)
Risk Categories:
Low risk
< 20th percentile (genetic protection)
Average risk
20-80th percentile (typical genetic predisposition)
Elevated risk
80-95th percentile (moderately increased risk)
High risk

95th percentile (substantially increased risk) Clinical Interpretation: Percentiles assume normal distribution Relative risk vs. average (not absolute risk) Combine with family history, clinical risk factors PRS is NOT diagnostic - many high-risk individuals never develop disease Best Practices PRS Construction Use validated PRS from PGS Catalog when available Published models have been externally validated Include LD clumping and ancestry-specific weights Match ancestries between GWAS and target population European GWAS for European individuals Use multi-ancestry GWAS when available Include as many SNPs as practical More SNPs = better prediction (up to a point) Balance between coverage and genotyping cost Consider trait architecture Highly polygenic traits (height, education): benefit from relaxed thresholds Oligogenic traits (IBD, T1D): few large-effect variants, strict thresholds Clinical Use Combine with clinical risk scores Add PRS to Framingham Risk Score, QRISK, etc. Integrated models improve prediction Stratify screening and prevention Intensify surveillance for high PRS (e.g., earlier mammography) Lifestyle interventions for modifiable risk Provide genetic counseling Explain probabilistic nature of PRS Discuss limitations and uncertainty Address psychological impact Consider actionability Is there an intervention for high risk? Benefits vs. harms of knowing genetic risk Research Use Report methods transparently Document SNP selection criteria Report LD clumping parameters Specify ancestry of GWAS and target Validate in held-out cohorts Split data: training vs. testing Report out-of-sample prediction accuracy (R², AUC) Compare to existing PRS Benchmark against PGS Catalog models Report incremental improvement Test across ancestries Evaluate transferability to non-European populations Report performance stratified by ancestry Disclaimer This skill is for educational and research purposes only. Not for clinical diagnosis or treatment decisions Not validated for clinical use - use PGS Catalog models for clinical-grade PRS Requires genetic counseling - interpretation requires expertise Does not account for family history, environment, or lifestyle factors Ancestry-specific - accuracy depends on matching GWAS ancestry For clinical genetic testing, consult: Genetic counselors (certified by ABGC/ABMGG) Medical geneticists Healthcare providers with genomics training PRS is a rapidly evolving field. Guidelines and best practices will continue to change as research progresses. Regulatory Status: FDA does not currently regulate PRS (as of 2024) Some countries restrict direct-to-consumer genetic risk reporting Check local regulations before clinical implementation

返回排行榜