tooluniverse-protein-therapeutic-design

安装量: 141
排名: #6079

安装

npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-protein-therapeutic-design

Therapeutic Protein Designer AI-guided de novo protein design using RFdiffusion backbone generation, ProteinMPNN sequence optimization, and structure validation for therapeutic protein development. KEY PRINCIPLES : Structure-first design - Generate backbone geometry before sequence Target-guided - Design binders with target structure in mind Iterative validation - Predict structure to validate designs Developability-aware - Consider aggregation, immunogenicity, expression Evidence-graded - Grade designs by confidence metrics Actionable output - Provide sequences ready for experimental testing English-first queries - Always use English terms in tool calls (protein names, target names), even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language When to Use Apply when user asks: "Design a protein binder for [target]" "Create a therapeutic protein against [protein/epitope]" "Design a protein scaffold with [property]" "Optimize this protein sequence for [function]" "Design a de novo enzyme for [reaction]" "Generate protein variants for [target binding]" Critical Workflow Requirements 1. Report-First Approach (MANDATORY) Create the report file FIRST : File name: [TARGET]_protein_design_report.md Initialize with section headers Add placeholder: [Designing...] Progressively update as designs are generated Output separate files : [TARGET]_designed_sequences.fasta - All designed sequences [TARGET]_top_candidates.csv - Ranked candidates with metrics 2. Design Documentation (MANDATORY) Every design MUST include:

Design: Binder_001
**
Sequence
**
MVLSPADKTN...
**
Length
**
85 amino acids
**
Target
**
PD-L1 (UniProt: Q9NZQ7)
**
Method
**
RFdiffusion → ProteinMPNN → ESMFold validation ** Quality Metrics ** : | Metric | Value | Interpretation | |

|

|

| | pLDDT | 88.5 | High confidence | | pTM | 0.82 | Good fold | | ProteinMPNN score | -2.3 | Favorable | | Predicted binding | Strong | Based on interface pLDDT | * Source: NVIDIA NIM via NvidiaNIM_rfdiffusion , NvidiaNIM_proteinmpnn , NvidiaNIM_esmfold * Phase 0: Tool Verification NVIDIA NIM Tools Required Tool Purpose API Key Required NvidiaNIM_rfdiffusion Backbone generation Yes NvidiaNIM_proteinmpnn Sequence design Yes NvidiaNIM_esmfold Fast structure validation Yes NvidiaNIM_alphafold2 High-accuracy validation Yes NvidiaNIM_esm2_650m Sequence embeddings Yes Parameter Verification Tool WRONG Parameter CORRECT Parameter NvidiaNIM_rfdiffusion num_steps diffusion_steps NvidiaNIM_proteinmpnn pdb pdb_string NvidiaNIM_esmfold seq sequence Workflow Overview Phase 1: Target Characterization ├── Get target structure (PDB, EMDB cryo-EM, or AlphaFold) ├── Identify binding epitope ├── Analyze existing binders ├── Check EMDB for membrane protein structures (NEW) └── OUTPUT: Target profile ↓ Phase 2: Backbone Generation (RFdiffusion) ├── Define design constraints ├── Generate multiple backbones ├── Filter by geometry quality └── OUTPUT: Candidate backbones ↓ Phase 3: Sequence Design (ProteinMPNN) ├── Design sequences for each backbone ├── Sample multiple sequences per backbone ├── Score by ProteinMPNN likelihood └── OUTPUT: Designed sequences ↓ Phase 4: Structure Validation ├── Predict structure (ESMFold/AlphaFold2) ├── Compare to designed backbone ├── Assess fold quality (pLDDT, pTM) └── OUTPUT: Validated designs ↓ Phase 5: Developability Assessment ├── Aggregation propensity ├── Expression likelihood ├── Immunogenicity prediction └── OUTPUT: Developability scores ↓ Phase 6: Report Synthesis ├── Ranked candidate list ├── Experimental recommendations ├── Next steps └── OUTPUT: Final report Phase 1: Target Characterization 1.1 Get Target Structure def get_target_structure ( tu , target_id ) : """Get target structure from PDB, EMDB, or predict."""

Try PDB first (X-ray/NMR)

pdb_results

tu . tools . PDB_search_by_uniprot ( uniprot_id = target_id ) if pdb_results :

Get highest resolution structure

best_pdb

sorted ( pdb_results , key = lambda x : x [ 'resolution' ] ) [ 0 ] structure = tu . tools . PDB_get_structure ( pdb_id = best_pdb [ 'pdb_id' ] ) return { 'source' : 'PDB' , 'pdb_id' : best_pdb [ 'pdb_id' ] , 'resolution' : best_pdb [ 'resolution' ] , 'structure' : structure }

Try EMDB for cryo-EM structures (valuable for membrane proteins)

protein_info

tu . tools . UniProt_get_protein_by_accession ( accession = target_id ) emdb_results = tu . tools . emdb_search ( query = protein_info [ 'proteinDescription' ] [ 'recommendedName' ] [ 'fullName' ] [ 'value' ] ) if emdb_results and len ( emdb_results )

0 :

Get highest resolution cryo-EM entry

best_emdb

sorted ( emdb_results , key = lambda x : x . get ( 'resolution' , 99 ) ) [ 0 ]

Get associated PDB model if available

emdb_details

tu . tools . emdb_get_entry ( entry_id = best_emdb [ 'emdb_id' ] ) if emdb_details . get ( 'pdb_ids' ) : structure = tu . tools . PDB_get_structure ( pdb_id = emdb_details [ 'pdb_ids' ] [ 0 ] ) return { 'source' : 'EMDB cryo-EM' , 'emdb_id' : best_emdb [ 'emdb_id' ] , 'pdb_id' : emdb_details [ 'pdb_ids' ] [ 0 ] , 'resolution' : best_emdb . get ( 'resolution' ) , 'structure' : structure }

Fallback to AlphaFold prediction

sequence

tu
.
tools
.
UniProt_get_protein_sequence
(
accession
=
target_id
)
structure
=
tu
.
tools
.
NvidiaNIM_alphafold2
(
sequence
=
sequence
[
'sequence'
]
,
algorithm
=
"mmseqs2"
)
return
{
'source'
:
'AlphaFold2 (predicted)'
,
'structure'
:
structure
}
1.1b EMDB for Membrane Proteins (NEW)
When to prioritize EMDB
Membrane proteins, large complexes, and targets where conformational states matter. def get_cryoem_structures ( tu , target_name ) : """Get cryo-EM structures for membrane proteins/complexes."""

Search EMDB

emdb_results

tu . tools . emdb_search ( query = f" { target_name } membrane OR receptor" ) structures = [ ] for entry in emdb_results [ : 5 ] : details = tu . tools . emdb_get_entry ( entry_id = entry [ 'emdb_id' ] ) structures . append ( { 'emdb_id' : entry [ 'emdb_id' ] , 'resolution' : entry . get ( 'resolution' , 'N/A' ) , 'title' : entry . get ( 'title' , 'N/A' ) , 'conformational_state' : details . get ( 'state' , 'Unknown' ) , 'pdb_models' : details . get ( 'pdb_ids' , [ ] ) } ) return structures Output for Report :

1.1b Cryo-EM Structures (EMDB) | EMDB ID | Resolution | PDB Model | Conformation | |


|

|

|

|
|
EMD-12345
|
2.8 Å
|
7ABC
|
Active state
|
|
EMD-23456
|
3.1 Å
|
8DEF
|
Inactive state
|
**
Note
**
Cryo-EM structures capture physiologically relevant conformations for membrane protein targets. * Source: EMDB * 1.2 Identify Binding Epitope def identify_epitope ( tu , target_structure , epitope_residues = None ) : """Identify or validate binding epitope.""" if epitope_residues :

User-specified epitope

return { 'residues' : epitope_residues , 'source' : 'user-defined' }

Find surface-exposed regions

Use structural analysis to identify potential epitopes

return analyze_surface ( target_structure ) 1.3 Output for Report

  1. Target Characterization

1.1 Target Information | Property | Value | |


|

| | ** Target ** | PD-L1 (Programmed death-ligand 1) | | ** UniProt ** | Q9NZQ7 | | ** Structure source ** | PDB: 4ZQK (2.0 Å resolution) | | ** Binding epitope ** | IgV domain, residues 19-127 | | ** Known binders ** | Atezolizumab, durvalumab, avelumab |

1.2 Epitope Analysis | Residue Range | Type | Surface Area | Druggability | |


|

|

|

|
|
54-68
|
Loop
|
850 Ų
|
High
|
|
115-125
|
Beta strand
|
420 Ų
|
Medium
|
|
19-30
|
N-terminus
|
380 Ų
|
Medium
|
**
Selected Epitope
**
Residues 54-68 (PD-1 binding interface) * Source: PDB 4ZQK, surface analysis * Phase 2: Backbone Generation 2.1 RFdiffusion Design def generate_backbones ( tu , design_params ) : """Generate de novo backbones using RFdiffusion.""" backbones = tu . tools . NvidiaNIM_rfdiffusion ( diffusion_steps = design_params . get ( 'steps' , 50 ) ,

Additional parameters depending on design type

) return backbones 2.2 Design Modes Mode Use Case Key Parameters Unconditional De novo scaffold diffusion_steps only Binder design Target-guided binder target_structure , hotspot_residues Motif scaffolding Functional motif embedding motif_sequence , motif_structure 2.3 Output for Report

  1. Backbone Generation

2.1 Design Parameters | Parameter | Value | |


|

| | ** Method ** | RFdiffusion via NVIDIA NIM | | ** Design mode ** | Unconditional scaffold generation | | ** Diffusion steps ** | 50 | | ** Number generated ** | 10 backbones |

2.2 Generated Backbones | Backbone | Length | Topology | Quality | |


|

|

|

|
|
BB_001
|
85 aa
|
3-helix bundle
|
Good
|
|
BB_002
|
92 aa
|
Beta sandwich
|
Good
|
|
BB_003
|
78 aa
|
Alpha-beta
|
Good
|
|
BB_004
|
88 aa
|
All-alpha
|
Moderate
|
|
BB_005
|
95 aa
|
Mixed
|
Good
|
**
Selected for sequence design
**
BB_001, BB_002, BB_003, BB_005 (top 4) * Source: NVIDIA NIM via NvidiaNIM_rfdiffusion * Phase 3: Sequence Design 3.1 ProteinMPNN Design def design_sequences ( tu , backbone_pdb , num_sequences = 8 ) : """Design sequences for backbone using ProteinMPNN.""" sequences = tu . tools . NvidiaNIM_proteinmpnn ( pdb_string = backbone_pdb , num_sequences = num_sequences , temperature = 0.1

Lower = more conservative

) return sequences 3.2 Sampling Parameters Parameter Conservative Moderate Diverse Temperature 0.1 0.2 0.5 Sequences per backbone 4 8 16 Use case Validated scaffold Exploration Diversity 3.3 Output for Report

  1. Sequence Design

3.1 Design Parameters | Parameter | Value | |


|

| | ** Method ** | ProteinMPNN via NVIDIA NIM | | ** Temperature ** | 0.1 (conservative) | | ** Sequences per backbone ** | 8 | | ** Total sequences ** | 32 |

3.2 Designed Sequences (Top 10 by Score) | Rank | Backbone | Sequence ID | Length | MPNN Score | Predicted pI | |


|

|

|

|

|

| | 1 | BB_001 | Seq_001_A | 85 | -1.89 | 6.2 | | 2 | BB_002 | Seq_002_C | 92 | -1.95 | 5.8 | | 3 | BB_001 | Seq_001_B | 85 | -2.01 | 7.1 | | 4 | BB_003 | Seq_003_A | 78 | -2.08 | 6.5 | | 5 | BB_005 | Seq_005_B | 95 | -2.12 | 5.4 |

3.3 Top Sequence: Seq_001_A Seq_001_A (85 aa, MPNN score: -1.89) MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL Source: NVIDIA NIM via NvidiaNIM_proteinmpnn Phase 4: Structure Validation 4.1 ESMFold Validation def validate_structure ( tu , sequence ) : """Validate designed sequence by structure prediction."""

Fast validation with ESMFold

predicted

tu . tools . NvidiaNIM_esmfold ( sequence = sequence )

Extract quality metrics

plddt

extract_plddt ( predicted ) ptm = extract_ptm ( predicted ) return { 'structure' : predicted , 'mean_plddt' : np . mean ( plddt ) , 'ptm' : ptm , 'passes' : np . mean ( plddt )

70 and ptm

0.7 } 4.2 Validation Criteria Metric Threshold Interpretation Mean pLDDT 70 Confident fold pTM 0.7 Good global topology RMSD to backbone <2 Å Design recapitulated 4.3 Output for Report

  1. Structure Validation

4.1 Validation Results | Sequence | pLDDT | pTM | RMSD to Design | Status | |


|

|

|

|

| | Seq_001_A | 88.5 | 0.85 | 1.2 Å | ✓ PASS | | Seq_002_C | 82.3 | 0.79 | 1.5 Å | ✓ PASS | | Seq_001_B | 85.1 | 0.82 | 1.3 Å | ✓ PASS | | Seq_003_A | 79.8 | 0.76 | 1.8 Å | ✓ PASS | | Seq_005_B | 68.2 | 0.65 | 2.8 Å | ✗ FAIL |

4.2 Top Validated Design: Seq_001_A | Region | Residues | pLDDT | Interpretation | |


|

|

|

|
|
Helix 1
|
1-28
|
92.3
|
Very high confidence
|
|
Loop 1
|
29-35
|
78.4
|
Moderate confidence
|
|
Helix 2
|
36-58
|
91.8
|
Very high confidence
|
|
Loop 2
|
59-65
|
75.2
|
Moderate confidence
|
|
Helix 3
|
66-85
|
90.1
|
Very high confidence
|
**
Overall
**
Well-folded 3-helix bundle with high confidence core * Source: NVIDIA NIM via NvidiaNIM_esmfold * Phase 5: Developability Assessment 5.1 Aggregation Propensity def assess_aggregation ( sequence ) : """Assess aggregation propensity."""

Calculate hydrophobic patches

Calculate isoelectric point

Identify aggregation-prone motifs

return { 'aggregation_score' : score , 'hydrophobic_patches' : patches , 'risk_level' : 'Low' if score < 0.5 else 'Medium' if score < 0.7 else 'High' } 5.2 Developability Metrics Metric Favorable Marginal Unfavorable Aggregation score <0.5 0.5-0.7

0.7 Isoelectric point 5-9 4-5 or 9-10 <4 or >10 Hydrophobic patches <3 3-5 5 Cysteine count 0 or even Odd Multiple unpaired 5.3 Output for Report

  1. Developability Assessment

5.1 Developability Scores | Design | Aggregation | pI | Cysteines | Expression | Overall | |


|

|

|

|

|

| | Seq_001_A | 0.32 (Low) | 6.2 | 0 | High | ★★★ | | Seq_002_C | 0.45 (Low) | 5.8 | 2 (paired) | Medium | ★★☆ | | Seq_001_B | 0.38 (Low) | 7.1 | 0 | High | ★★★ | | Seq_003_A | 0.58 (Med) | 6.5 | 0 | Medium | ★★☆ |

5.2 Recommendations
**
Best candidate for expression
**

Seq_001_A

Low aggregation propensity

Neutral pI (easy purification)

No cysteines (no misfolding risk)

Predicted high E. coli expression * Source: Sequence analysis * Report Template

Therapeutic Protein Design Report: [TARGET]
**
Generated
**
[Date] |
**
Query
**
[Original query] |
**
Status
**
In Progress

Executive Summary [Designing...]


  1. Target Characterization

1.1 Target Information [Designing...]

1.2 Binding Epitope [Designing...]


  1. Backbone Generation

2.1 Design Parameters [Designing...]

2.2 Generated Backbones [Designing...]


  1. Sequence Design

3.1 ProteinMPNN Results [Designing...]

3.2 Top Sequences [Designing...]


  1. Structure Validation

4.1 ESMFold Validation [Designing...]

4.2 Quality Metrics [Designing...]


  1. Developability Assessment

5.1 Scores [Designing...]

5.2 Recommendations [Designing...]


  1. Final Candidates

6.1 Ranked List [Designing...]

6.2 Sequences for Testing [Designing...]


  1. Experimental Recommendations [Designing...]

  1. Data Sources [Will be populated...] Evidence Grading Tier Symbol Criteria T1 ★★★ pLDDT >85, pTM >0.8, low aggregation, neutral pI T2 ★★☆ pLDDT >75, pTM >0.7, acceptable developability T3 ★☆☆ pLDDT >70, pTM >0.65, developability concerns T4 ☆☆☆ Failed validation or major developability issues Completeness Checklist Phase 1: Target Target structure obtained (PDB or predicted) Binding epitope identified Existing binders noted Phase 2: Backbones ≥5 backbones generated Top 3-5 selected for sequence design Selection criteria documented Phase 3: Sequences ≥8 sequences per backbone designed MPNN scores reported Top 10 sequences listed Phase 4: Validation All sequences validated by ESMFold pLDDT and pTM reported Pass/fail criteria applied ≥3 passing designs Phase 5: Developability Aggregation assessed pI calculated Expression prediction Final ranking Phase 6: Deliverables Ranked candidate list FASTA file with sequences Experimental recommendations Fallback Chains Primary Tool Fallback 1 Fallback 2 NvidiaNIM_rfdiffusion Manual backbone design Scaffold from PDB NvidiaNIM_proteinmpnn Rosetta ProteinMPNN Manual sequence design NvidiaNIM_esmfold NvidiaNIM_alphafold2 AlphaFold DB PDB structure NvidiaNIM_alphafold2 AlphaFold DB Tool Reference See TOOLS_REFERENCE.md for complete tool documentation.
返回排行榜