- ToolUniverse Immune Repertoire Analysis
- Comprehensive skill for analyzing T-cell receptor (TCR) and B-cell receptor (BCR) repertoire sequencing data to characterize adaptive immune responses, clonal expansion, and antigen specificity.
- Overview
- Adaptive immune receptor repertoire sequencing (AIRR-seq) enables comprehensive profiling of T-cell and B-cell populations through high-throughput sequencing of TCR and BCR variable regions. This skill provides an 8-phase workflow for:
- Clonotype identification and tracking
- Diversity and clonality assessment
- V(D)J gene usage analysis
- CDR3 sequence characterization
- Clonal expansion and convergence detection
- Epitope specificity prediction
- Integration with single-cell phenotyping
- Longitudinal repertoire tracking
- Core Workflow
- Phase 1: Data Import & Clonotype Definition
- Load AIRR-seq data from common formats (MiXCR, ImmunoSEQ, AIRR standard, 10x Genomics VDJ). Standardize columns to:
- cloneId
- ,
- count
- ,
- frequency
- ,
- cdr3aa
- ,
- cdr3nt
- ,
- v_gene
- ,
- j_gene
- ,
- chain
- . Define clonotypes using one of three methods:
- cdr3aa
-
- Amino acid CDR3 sequence only
- cdr3nt
-
- Nucleotide CDR3 sequence
- vj_cdr3
-
- V gene + J gene + CDR3aa (most common, recommended)
- Aggregate by clonotype, sort by count, assign ranks.
- Phase 2: Diversity & Clonality Analysis
- Calculate diversity metrics for the repertoire:
- Shannon entropy
-
- Overall diversity (higher = more diverse)
- Simpson index
-
- Probability two random clones are same
- Inverse Simpson
-
- Effective number of clonotypes
- Gini coefficient
-
- Inequality in clonotype distribution
- Clonality
-
- 1 - Pielou's evenness (higher = more clonal)
- Richness
-
- Number of unique clonotypes
- Generate rarefaction curves to assess whether sequencing depth is sufficient.
- Phase 3: V(D)J Gene Usage Analysis
- Analyze V and J gene usage patterns weighted by clonotype count:
- V gene family usage frequencies
- J gene family usage frequencies
- V-J pairing frequencies
- Statistical testing for biased usage (chi-square test vs. uniform expectation)
- Phase 4: CDR3 Sequence Analysis
- Characterize CDR3 sequences:
- Length distribution
-
- Typical TCR CDR3 = 12-18 aa; BCR CDR3 = 10-20 aa
- Amino acid composition
-
- Weighted by clonotype frequency
- Flag unusual length distributions (may indicate PCR bias)
- Phase 5: Clonal Expansion Detection
- Identify expanded clonotypes above a frequency threshold (default: 95th percentile). Track clonotypes longitudinally across multiple timepoints to measure persistence, mean/max frequency, and fold changes.
- Phase 6: Convergence & Public Clonotypes
- Convergent recombination
-
- Same CDR3 amino acid from different nucleotide sequences (evidence of antigen-driven selection)
- Public clonotypes
- Shared across multiple samples/individuals (may indicate common antigen responses) Phase 7: Epitope Prediction & Specificity Query epitope databases for known TCR-epitope associations: IEDB ( IEDB_search_tcells ): Search by CDR3 receptor sequence VDJdb (manual): https://vdjdb.cdr3.net/search PubMed literature ( PubMed_search ): Search for CDR3 + epitope/antigen/specificity Phase 8: Integration with Single-Cell Data Link TCR/BCR clonotypes to cell phenotypes from paired single-cell RNA-seq: Map clonotypes to cell barcodes Identify expanded clonotype phenotypes on UMAP Analyze clonotype-cluster associations (cross-tabulation) Find cluster-specific clonotypes (>80% cells in one cluster) Differential gene expression: expanded vs. non-expanded cells ToolUniverse Tool Integration Key Tools Used : IEDB_search_tcells - Known T-cell epitopes IEDB_search_bcells - Known B-cell epitopes PubMed_search - Literature on TCR/BCR specificity UniProt_get_protein - Antigen protein information Integration with Other Skills : tooluniverse-single-cell - Single-cell transcriptomics tooluniverse-rnaseq-deseq2 - Bulk RNA-seq analysis tooluniverse-variant-analysis - Somatic hypermutation analysis (BCR) Quick Start from tooluniverse import ToolUniverse
1. Load data
tcr_data
load_airr_data ( "clonotypes.txt" , format = 'mixcr' )
2. Define clonotypes
clonotypes
define_clonotypes ( tcr_data , method = 'vj_cdr3' )
3. Calculate diversity
diversity
calculate_diversity ( clonotypes [ 'count' ] ) print ( f"Shannon entropy: { diversity [ 'shannon_entropy' ] : .2f } " )
4. Detect expanded clones
expansion
detect_expanded_clones ( clonotypes ) print ( f"Expanded clonotypes: { expansion [ 'n_expanded' ] } " )
5. Analyze V(D)J usage
vdj_usage
analyze_vdj_usage ( tcr_data )
6. Query epitope databases
top_clones
expansion [ 'expanded_clonotypes' ] [ 'clonotype' ] . head ( 10 ) epitopes = query_epitope_database ( top_clones ) References Dash P, et al. (2017) Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature Glanville J, et al. (2017) Identifying specificity groups in the T cell receptor repertoire. Nature Stubbington MJT, et al. (2016) T cell fate and clonality inference from single-cell transcriptomes. Nature Methods Vander Heiden JA, et al. (2014) pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics See Also ANALYSIS_DETAILS.md - Detailed code snippets for all 8 phases USE_CASES.md - Complete use cases (immunotherapy, vaccine, autoimmune, single-cell integration) and best practices