Therapeutic Protein Designer AI-guided de novo protein design using RFdiffusion backbone generation, ProteinMPNN sequence optimization, and structure validation for therapeutic protein development. KEY PRINCIPLES : Structure-first design - Generate backbone geometry before sequence Target-guided - Design binders with target structure in mind Iterative validation - Predict structure to validate designs Developability-aware - Consider aggregation, immunogenicity, expression Evidence-graded - Grade designs by confidence metrics Actionable output - Provide sequences ready for experimental testing English-first queries - Always use English terms in tool calls (protein names, target names), even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language When to Use Apply when user asks: "Design a protein binder for [target]" "Create a therapeutic protein against [protein/epitope]" "Design a protein scaffold with [property]" "Optimize this protein sequence for [function]" "Design a de novo enzyme for [reaction]" "Generate protein variants for [target binding]" Critical Workflow Requirements 1. Report-First Approach (MANDATORY) Create the report file FIRST : File name: [TARGET]_protein_design_report.md Initialize with section headers Add placeholder: [Designing...] Progressively update as designs are generated Output separate files : [TARGET]_designed_sequences.fasta - All designed sequences [TARGET]_top_candidates.csv - Ranked candidates with metrics 2. Design Documentation (MANDATORY) Every design MUST include:
- Design: Binder_001
- **
- Sequence
- **
-
- MVLSPADKTN...
- **
- Length
- **
-
- 85 amino acids
- **
- Target
- **
-
- PD-L1 (UniProt: Q9NZQ7)
- **
- Method
- **
- RFdiffusion → ProteinMPNN → ESMFold validation ** Quality Metrics ** : | Metric | Value | Interpretation | |
|
|
|
|
pLDDT
|
88.5
|
High confidence
|
|
pTM
|
0.82
|
Good fold
|
|
ProteinMPNN score
|
-2.3
|
Favorable
|
|
Predicted binding
|
Strong
|
Based on interface pLDDT
|
*
Source: NVIDIA NIM via
NvidiaNIM_rfdiffusion
,
NvidiaNIM_proteinmpnn
,
NvidiaNIM_esmfold
*
Phase 0: Tool Verification
NVIDIA NIM Tools Required
Tool
Purpose
API Key Required
NvidiaNIM_rfdiffusion
Backbone generation
Yes
NvidiaNIM_proteinmpnn
Sequence design
Yes
NvidiaNIM_esmfold
Fast structure validation
Yes
NvidiaNIM_alphafold2
High-accuracy validation
Yes
NvidiaNIM_esm2_650m
Sequence embeddings
Yes
Parameter Verification
Tool
WRONG Parameter
CORRECT Parameter
NvidiaNIM_rfdiffusion
num_steps
diffusion_steps
NvidiaNIM_proteinmpnn
pdb
pdb_string
NvidiaNIM_esmfold
seq
sequence
Workflow Overview
Phase 1: Target Characterization
├── Get target structure (PDB, EMDB cryo-EM, or AlphaFold)
├── Identify binding epitope
├── Analyze existing binders
├── Check EMDB for membrane protein structures (NEW)
└── OUTPUT: Target profile
↓
Phase 2: Backbone Generation (RFdiffusion)
├── Define design constraints
├── Generate multiple backbones
├── Filter by geometry quality
└── OUTPUT: Candidate backbones
↓
Phase 3: Sequence Design (ProteinMPNN)
├── Design sequences for each backbone
├── Sample multiple sequences per backbone
├── Score by ProteinMPNN likelihood
└── OUTPUT: Designed sequences
↓
Phase 4: Structure Validation
├── Predict structure (ESMFold/AlphaFold2)
├── Compare to designed backbone
├── Assess fold quality (pLDDT, pTM)
└── OUTPUT: Validated designs
↓
Phase 5: Developability Assessment
├── Aggregation propensity
├── Expression likelihood
├── Immunogenicity prediction
└── OUTPUT: Developability scores
↓
Phase 6: Report Synthesis
├── Ranked candidate list
├── Experimental recommendations
├── Next steps
└── OUTPUT: Final report
Phase 1: Target Characterization
1.1 Get Target Structure
def
get_target_structure
(
tu
,
target_id
)
:
"""Get target structure from PDB, EMDB, or predict."""
Try PDB first (X-ray/NMR)
pdb_results
tu . tools . PDB_search_by_uniprot ( uniprot_id = target_id ) if pdb_results :
Get highest resolution structure
best_pdb
sorted ( pdb_results , key = lambda x : x [ 'resolution' ] ) [ 0 ] structure = tu . tools . PDB_get_structure ( pdb_id = best_pdb [ 'pdb_id' ] ) return { 'source' : 'PDB' , 'pdb_id' : best_pdb [ 'pdb_id' ] , 'resolution' : best_pdb [ 'resolution' ] , 'structure' : structure }
Try EMDB for cryo-EM structures (valuable for membrane proteins)
protein_info
tu . tools . UniProt_get_protein_by_accession ( accession = target_id ) emdb_results = tu . tools . emdb_search ( query = protein_info [ 'proteinDescription' ] [ 'recommendedName' ] [ 'fullName' ] [ 'value' ] ) if emdb_results and len ( emdb_results )
0 :
Get highest resolution cryo-EM entry
best_emdb
sorted ( emdb_results , key = lambda x : x . get ( 'resolution' , 99 ) ) [ 0 ]
Get associated PDB model if available
emdb_details
tu . tools . emdb_get_entry ( entry_id = best_emdb [ 'emdb_id' ] ) if emdb_details . get ( 'pdb_ids' ) : structure = tu . tools . PDB_get_structure ( pdb_id = emdb_details [ 'pdb_ids' ] [ 0 ] ) return { 'source' : 'EMDB cryo-EM' , 'emdb_id' : best_emdb [ 'emdb_id' ] , 'pdb_id' : emdb_details [ 'pdb_ids' ] [ 0 ] , 'resolution' : best_emdb . get ( 'resolution' ) , 'structure' : structure }
Fallback to AlphaFold prediction
sequence
- tu
- .
- tools
- .
- UniProt_get_protein_sequence
- (
- accession
- =
- target_id
- )
- structure
- =
- tu
- .
- tools
- .
- NvidiaNIM_alphafold2
- (
- sequence
- =
- sequence
- [
- 'sequence'
- ]
- ,
- algorithm
- =
- "mmseqs2"
- )
- return
- {
- 'source'
- :
- 'AlphaFold2 (predicted)'
- ,
- 'structure'
- :
- structure
- }
- 1.1b EMDB for Membrane Proteins (NEW)
- When to prioritize EMDB
- Membrane proteins, large complexes, and targets where conformational states matter. def get_cryoem_structures ( tu , target_name ) : """Get cryo-EM structures for membrane proteins/complexes."""
Search EMDB
emdb_results
tu . tools . emdb_search ( query = f" { target_name } membrane OR receptor" ) structures = [ ] for entry in emdb_results [ : 5 ] : details = tu . tools . emdb_get_entry ( entry_id = entry [ 'emdb_id' ] ) structures . append ( { 'emdb_id' : entry [ 'emdb_id' ] , 'resolution' : entry . get ( 'resolution' , 'N/A' ) , 'title' : entry . get ( 'title' , 'N/A' ) , 'conformational_state' : details . get ( 'state' , 'Unknown' ) , 'pdb_models' : details . get ( 'pdb_ids' , [ ] ) } ) return structures Output for Report :
1.1b Cryo-EM Structures (EMDB) | EMDB ID | Resolution | PDB Model | Conformation | |
|
|
|
- |
- |
- EMD-12345
- |
- 2.8 Å
- |
- 7ABC
- |
- Active state
- |
- |
- EMD-23456
- |
- 3.1 Å
- |
- 8DEF
- |
- Inactive state
- |
- **
- Note
- **
- Cryo-EM structures capture physiologically relevant conformations for membrane protein targets. * Source: EMDB * 1.2 Identify Binding Epitope def identify_epitope ( tu , target_structure , epitope_residues = None ) : """Identify or validate binding epitope.""" if epitope_residues :
User-specified epitope
return { 'residues' : epitope_residues , 'source' : 'user-defined' }
Find surface-exposed regions
Use structural analysis to identify potential epitopes
return analyze_surface ( target_structure ) 1.3 Output for Report
- Target Characterization
1.1 Target Information | Property | Value | |
|
| | ** Target ** | PD-L1 (Programmed death-ligand 1) | | ** UniProt ** | Q9NZQ7 | | ** Structure source ** | PDB: 4ZQK (2.0 Å resolution) | | ** Binding epitope ** | IgV domain, residues 19-127 | | ** Known binders ** | Atezolizumab, durvalumab, avelumab |
1.2 Epitope Analysis | Residue Range | Type | Surface Area | Druggability | |
|
|
|
- |
- |
- 54-68
- |
- Loop
- |
- 850 Ų
- |
- High
- |
- |
- 115-125
- |
- Beta strand
- |
- 420 Ų
- |
- Medium
- |
- |
- 19-30
- |
- N-terminus
- |
- 380 Ų
- |
- Medium
- |
- **
- Selected Epitope
- **
- Residues 54-68 (PD-1 binding interface) * Source: PDB 4ZQK, surface analysis * Phase 2: Backbone Generation 2.1 RFdiffusion Design def generate_backbones ( tu , design_params ) : """Generate de novo backbones using RFdiffusion.""" backbones = tu . tools . NvidiaNIM_rfdiffusion ( diffusion_steps = design_params . get ( 'steps' , 50 ) ,
Additional parameters depending on design type
) return backbones 2.2 Design Modes Mode Use Case Key Parameters Unconditional De novo scaffold diffusion_steps only Binder design Target-guided binder target_structure , hotspot_residues Motif scaffolding Functional motif embedding motif_sequence , motif_structure 2.3 Output for Report
- Backbone Generation
2.1 Design Parameters | Parameter | Value | |
|
| | ** Method ** | RFdiffusion via NVIDIA NIM | | ** Design mode ** | Unconditional scaffold generation | | ** Diffusion steps ** | 50 | | ** Number generated ** | 10 backbones |
2.2 Generated Backbones | Backbone | Length | Topology | Quality | |
|
|
|
- |
- |
- BB_001
- |
- 85 aa
- |
- 3-helix bundle
- |
- Good
- |
- |
- BB_002
- |
- 92 aa
- |
- Beta sandwich
- |
- Good
- |
- |
- BB_003
- |
- 78 aa
- |
- Alpha-beta
- |
- Good
- |
- |
- BB_004
- |
- 88 aa
- |
- All-alpha
- |
- Moderate
- |
- |
- BB_005
- |
- 95 aa
- |
- Mixed
- |
- Good
- |
- **
- Selected for sequence design
- **
- BB_001, BB_002, BB_003, BB_005 (top 4)
*
Source: NVIDIA NIM via
NvidiaNIM_rfdiffusion* Phase 3: Sequence Design 3.1 ProteinMPNN Design def design_sequences ( tu , backbone_pdb , num_sequences = 8 ) : """Design sequences for backbone using ProteinMPNN.""" sequences = tu . tools . NvidiaNIM_proteinmpnn ( pdb_string = backbone_pdb , num_sequences = num_sequences , temperature = 0.1
Lower = more conservative
) return sequences 3.2 Sampling Parameters Parameter Conservative Moderate Diverse Temperature 0.1 0.2 0.5 Sequences per backbone 4 8 16 Use case Validated scaffold Exploration Diversity 3.3 Output for Report
- Sequence Design
3.1 Design Parameters | Parameter | Value | |
|
| | ** Method ** | ProteinMPNN via NVIDIA NIM | | ** Temperature ** | 0.1 (conservative) | | ** Sequences per backbone ** | 8 | | ** Total sequences ** | 32 |
3.2 Designed Sequences (Top 10 by Score) | Rank | Backbone | Sequence ID | Length | MPNN Score | Predicted pI | |
|
|
|
|
|
| | 1 | BB_001 | Seq_001_A | 85 | -1.89 | 6.2 | | 2 | BB_002 | Seq_002_C | 92 | -1.95 | 5.8 | | 3 | BB_001 | Seq_001_B | 85 | -2.01 | 7.1 | | 4 | BB_003 | Seq_003_A | 78 | -2.08 | 6.5 | | 5 | BB_005 | Seq_005_B | 95 | -2.12 | 5.4 |
3.3 Top Sequence: Seq_001_A
Seq_001_A (85 aa, MPNN score: -1.89)
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH
GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL
Source: NVIDIA NIM via NvidiaNIM_proteinmpnn
Phase 4: Structure Validation
4.1 ESMFold Validation
def
validate_structure
(
tu
,
sequence
)
:
"""Validate designed sequence by structure prediction."""
Fast validation with ESMFold
predicted
tu . tools . NvidiaNIM_esmfold ( sequence = sequence )
Extract quality metrics
plddt
extract_plddt ( predicted ) ptm = extract_ptm ( predicted ) return { 'structure' : predicted , 'mean_plddt' : np . mean ( plddt ) , 'ptm' : ptm , 'passes' : np . mean ( plddt )
70 and ptm
0.7 } 4.2 Validation Criteria Metric Threshold Interpretation Mean pLDDT 70 Confident fold pTM 0.7 Good global topology RMSD to backbone <2 Å Design recapitulated 4.3 Output for Report
- Structure Validation
4.1 Validation Results | Sequence | pLDDT | pTM | RMSD to Design | Status | |
|
|
|
|
| | Seq_001_A | 88.5 | 0.85 | 1.2 Å | ✓ PASS | | Seq_002_C | 82.3 | 0.79 | 1.5 Å | ✓ PASS | | Seq_001_B | 85.1 | 0.82 | 1.3 Å | ✓ PASS | | Seq_003_A | 79.8 | 0.76 | 1.8 Å | ✓ PASS | | Seq_005_B | 68.2 | 0.65 | 2.8 Å | ✗ FAIL |
4.2 Top Validated Design: Seq_001_A | Region | Residues | pLDDT | Interpretation | |
|
|
|
- |
- |
- Helix 1
- |
- 1-28
- |
- 92.3
- |
- Very high confidence
- |
- |
- Loop 1
- |
- 29-35
- |
- 78.4
- |
- Moderate confidence
- |
- |
- Helix 2
- |
- 36-58
- |
- 91.8
- |
- Very high confidence
- |
- |
- Loop 2
- |
- 59-65
- |
- 75.2
- |
- Moderate confidence
- |
- |
- Helix 3
- |
- 66-85
- |
- 90.1
- |
- Very high confidence
- |
- **
- Overall
- **
- Well-folded 3-helix bundle with high confidence core
*
Source: NVIDIA NIM via
NvidiaNIM_esmfold* Phase 5: Developability Assessment 5.1 Aggregation Propensity def assess_aggregation ( sequence ) : """Assess aggregation propensity."""
Calculate hydrophobic patches
Calculate isoelectric point
Identify aggregation-prone motifs
return { 'aggregation_score' : score , 'hydrophobic_patches' : patches , 'risk_level' : 'Low' if score < 0.5 else 'Medium' if score < 0.7 else 'High' } 5.2 Developability Metrics Metric Favorable Marginal Unfavorable Aggregation score <0.5 0.5-0.7
0.7 Isoelectric point 5-9 4-5 or 9-10 <4 or >10 Hydrophobic patches <3 3-5 5 Cysteine count 0 or even Odd Multiple unpaired 5.3 Output for Report
- Developability Assessment
5.1 Developability Scores | Design | Aggregation | pI | Cysteines | Expression | Overall | |
|
|
|
|
|
| | Seq_001_A | 0.32 (Low) | 6.2 | 0 | High | ★★★ | | Seq_002_C | 0.45 (Low) | 5.8 | 2 (paired) | Medium | ★★☆ | | Seq_001_B | 0.38 (Low) | 7.1 | 0 | High | ★★★ | | Seq_003_A | 0.58 (Med) | 6.5 | 0 | Medium | ★★☆ |
- 5.2 Recommendations
- **
- Best candidate for expression
- **
-
Seq_001_A
Low aggregation propensity
Neutral pI (easy purification)
No cysteines (no misfolding risk)
Predicted high E. coli expression * Source: Sequence analysis * Report Template
- Therapeutic Protein Design Report: [TARGET]
- **
- Generated
- **
-
- [Date] |
- **
- Query
- **
-
- [Original query] |
- **
- Status
- **
- In Progress
Executive Summary [Designing...]
- Target Characterization
1.1 Target Information [Designing...]
1.2 Binding Epitope [Designing...]
- Backbone Generation
2.1 Design Parameters [Designing...]
2.2 Generated Backbones [Designing...]
- Sequence Design
3.1 ProteinMPNN Results [Designing...]
3.2 Top Sequences [Designing...]
- Structure Validation
4.1 ESMFold Validation [Designing...]
4.2 Quality Metrics [Designing...]
- Developability Assessment
5.1 Scores [Designing...]
5.2 Recommendations [Designing...]
- Final Candidates
6.1 Ranked List [Designing...]
6.2 Sequences for Testing [Designing...]
- Experimental Recommendations [Designing...]
- Data Sources [Will be populated...] Evidence Grading Tier Symbol Criteria T1 ★★★ pLDDT >85, pTM >0.8, low aggregation, neutral pI T2 ★★☆ pLDDT >75, pTM >0.7, acceptable developability T3 ★☆☆ pLDDT >70, pTM >0.65, developability concerns T4 ☆☆☆ Failed validation or major developability issues Completeness Checklist Phase 1: Target Target structure obtained (PDB or predicted) Binding epitope identified Existing binders noted Phase 2: Backbones ≥5 backbones generated Top 3-5 selected for sequence design Selection criteria documented Phase 3: Sequences ≥8 sequences per backbone designed MPNN scores reported Top 10 sequences listed Phase 4: Validation All sequences validated by ESMFold pLDDT and pTM reported Pass/fail criteria applied ≥3 passing designs Phase 5: Developability Aggregation assessed pI calculated Expression prediction Final ranking Phase 6: Deliverables Ranked candidate list FASTA file with sequences Experimental recommendations Fallback Chains Primary Tool Fallback 1 Fallback 2 NvidiaNIM_rfdiffusion Manual backbone design Scaffold from PDB NvidiaNIM_proteinmpnn Rosetta ProteinMPNN Manual sequence design NvidiaNIM_esmfold NvidiaNIM_alphafold2 AlphaFold DB PDB structure NvidiaNIM_alphafold2 AlphaFold DB Tool Reference See TOOLS_REFERENCE.md for complete tool documentation.