- Literature Deep Research
- Systematic approach to comprehensive literature research: disambiguate the subject, search with collision-aware queries, grade evidence, and produce a structured report.
- KEY PRINCIPLES
- :
- Disambiguate first
- - Resolve IDs, synonyms, naming collisions before literature search
- Right-size the deliverable
- - Factoid mode for single questions; full report for deep research
- Evidence grading
- - Grade every claim (T1 mechanistic → T4 mention)
- Mandatory completeness
- - All sections must exist, even if "unknown/limited evidence"
- Source attribution
- - Every claim traceable to database/tool
- English-first queries
- - Use English for searches; respond in user's language
- Report = deliverable
- - Show findings, not search process
- Workflow Overview
- User Query
- ↓
- Phase 0: CLARIFY + MODE SELECT (factoid vs deep report)
- ↓
- Phase 1: SUBJECT DISAMBIGUATION + PROFILE
- ├─ Detect domain (biological target / drug / disease / general academic)
- ├─ Resolve identifiers and gather synonyms/aliases
- ├─ Check for naming collisions
- └─ Gather baseline context via annotation tools (domain-specific)
- ↓
- Phase 2: LITERATURE SEARCH (methodology kept internal)
- ├─ High-precision seed queries
- ├─ Citation network expansion from seeds
- ├─ Collision-filtered broader queries
- └─ Theme clustering + evidence grading
- ↓
- Phase 3: REPORT SYNTHESIS (report-first pattern)
- ├─ Create [topic]_report.md with all section headers IMMEDIATELY
- ├─ Progressively fill sections as data arrives (update after each phase)
- ├─ Write Executive Summary LAST (after all sections complete)
- ├─ Generate [topic]_bibliography.json + .csv
- └─ Validate completeness checklist
- Phase 0: Initial Clarification
- Ask only what is needed; skip questions with obvious answers:
- Subject type
-
- Gene/protein, disease, drug, CS/ML topic, social science, or general?
- Scope
-
- Single factoid to verify, or comprehensive deep review?
- Known aliases
- (if ambiguous): Specific names or symbols in use?
- Constraints
-
- Open access only? Include preprints? Specific organisms or date range?
- Mode Selection
- Mode
- When to Use
- Deliverable
- Factoid / Verification
- Single concrete question
- [topic]_factcheck_report.md
- (≤1 page) + bibliography
- Mini-review
- Narrow topic
- Short narrative report (1-3 pages)
- Full Deep-Research
- Comprehensive overview
- Full 15-section report + bibliography
- Heuristic
- "Which antibiotic was X evolved to resist?" → Factoid. "What does the literature say about X?" → Full. Factoid / Verification Mode (Fast Path) Provide a correct, source-verified answer with explicit evidence attribution.
[TOPIC]: Fact-check Report * Generated: [Date] *
Question [User question]
Answer ** [One-sentence answer] ** [Evidence: ★★★/★★☆/★☆☆/☆☆☆]
Source(s)
[Primary citation: journal/year/PMID/DOI]
Verification Notes
[1-3 bullets: where the statement appears, key constraints]
Limitations
- [Full text availability, evidence type caveats]
- Prefer ToolUniverse literature tools over web browsing. Use
- EuropePMC_search_articles(extract_terms_from_fulltext=[...])
- for OA snippet verification when possible.
- Detect Subject Domain
- Query Pattern
- Domain
- Phase 1 Action
- Gene symbol (EGFR, TP53)
- Biological target
- Full bio disambiguation
- Protein name ("V-ATPase")
- Biological target
- Full bio disambiguation
- Drug name ("metformin")
- Drug
- Drug disambiguation (see 1.5)
- Disease ("Alzheimer's")
- Disease
- Disease disambiguation (see 1.6)
- CS/ML topic ("transformer architecture")
- General academic
- Literature-only (skip bio tools)
- Method, concept, general topic
- General academic
- Literature-only (skip bio tools)
- Cross-domain ("GNNs for drug discovery")
- Interdisciplinary
- Resolve each entity in its domain (see 1.9)
- Cross-Skill Delegation
- For deep entity-specific research beyond literature, delegate to specialized skills:
- Gene/protein deep-dive
- (9-path profiling, druggability, GPCR data): use
- tooluniverse-target-research
- Drug comprehensive profile
- (ADMET, FDA labels, formulations): use
- tooluniverse-drug-research
- Disease comprehensive profile
- (ontologies, epidemiology, treatments): use
- tooluniverse-disease-research
- Use this skill when the focus is
- literature synthesis and evidence grading
- . Use specialized skills when the focus is
- entity profiling with structured database queries
- . For maximum depth, run both in parallel.
- Phase 1: Subject Disambiguation + Profile
- 1.1 Resolve Official Identifiers (Biological Targets)
- UniProt_search → UniProt accession
- UniProt_get_entry_by_accession → Full entry with cross-references
- UniProt_id_mapping → Map between ID types
- ensembl_lookup_gene → Ensembl gene ID, biotype
- MyGene_get_gene_annotation → NCBI Gene ID, aliases, summary
- 1.2 Naming Collision Detection
- Check the primary database for the domain (first 20 results). If >20% off-topic, build a negative filter:
- Domain
- Collision Check Syntax
- Biomedical
- PubMed:
- "[TERM]"[Title]
- CS/ML
- ArXiv:
- ti:"[TERM]"
- or SemanticScholar with
- fieldsOfStudy
- filter
- General
- OpenAlex or Crossref title search
- Identify collision terms from off-topic results
- Build negative filter:
- NOT [collision1] NOT [collision2]
- Gene family disambiguation
-
- Use official symbol with explicit exclusions.
- Example:
- "ADAR" NOT "ADAR2" NOT "ADARB1"
- for ADAR1-specific results.
- Cross-domain collision
-
- Some terms have different meanings across fields (e.g., "RAG" = Retrieval-Augmented Generation in CS, Recombination Activating Gene in biology). Add domain context terms to filter:
- "RAG" AND "language model" NOT "recombination activating"
- .
- 1.3 Baseline Profile (Biological Targets)
- Gather structural, functional, and expression context via annotation tools:
- InterPro_get_protein_domains → Domain architecture
- UniProt_get_ptm_processing_by_accession → PTMs, active sites
- HPA_get_subcellular_location → Localization
- GTEx_get_median_gene_expression → Tissue expression (use gtex_v8)
- GO_get_annotations_for_gene → GO terms
- Reactome_map_uniprot_to_pathways → Pathways
- STRING_get_protein_interactions → Interaction partners
- intact_get_interactions → Experimentally validated PPIs
- OpenTargets_get_target_tractability_by_ensemblID → Druggability assessment
- GPCR targets
- If the target is a GPCR (~35% of approved drug targets), delegate to tooluniverse-target-research for specialized GPCRdb data (3D structures, ligands, mutations). 1.4 Baseline Profile Output
Target Identity | Identifier | Value | Source | |
|
|
- |
- |
- Official Symbol
- |
- [SYMBOL]
- |
- HGNC
- |
- |
- UniProt
- |
- [ACC]
- |
- UniProt
- |
- |
- Ensembl Gene
- |
- [ENSG...]
- |
- Ensembl
- |
- **
- Synonyms
- **
-
- [list]
- **
- Collisions
- **
-
- [assessment]
- 1.5 Drug-Centric Disambiguation
- Skip protein architecture/expression/GO. Instead:
- Resolve identity
- :
- OpenTargets_get_drug_chembId_by_generic_name
- ,
- ChEMBL_get_drug
- ,
- PubChem_get_CID_by_compound_name
- ,
- drugbank_get_drug_basic_info_by_drug_name_or_id
- Targets & mechanisms
- :
- ChEMBL_get_drug_mechanisms
- ,
- OpenTargets_get_associated_targets_by_drug_chemblId
- ,
- DGIdb_get_drug_gene_interactions
- ,
- drugbank_get_targets_by_drug_name_or_drugbank_id
- Safety & indications
- :
- OpenTargets_get_drug_adverse_events_by_chemblId
- ,
- OpenTargets_get_drug_indications_by_chemblId
- ,
- search_clinical_trials
- 1.6 Disease-Centric Disambiguation
- Resolve ontology IDs
-
- Use
- OpenTargets_get_drug_chembId_by_generic_name
- or disease search tools to resolve EFO/MONDO IDs. Cross-reference ICD-10 and UMLS CUI when available from tool results.
- OpenTargets_get_diseases_phenotypes_by_target_ensembl → Disease associations
- DisGeNET_get_disease_genes → Disease-gene associations
- DisGeNET_search_disease → Disease search with ontology IDs
- CTD_get_disease_chemicals → Chemical-disease links
- 1.7 Compound Queries (e.g., "metformin in breast cancer")
- Resolve both entities separately, then cross-reference:
- CTD_get_chemical_gene_interactions → Chemical-gene links
- CTD_get_chemical_diseases → Chemical-disease associations
- OpenTargets_get_associated_targets_by_drug_chemblId → Drug targets
- OpenTargets_get_associated_diseases_by_drug_chemblId → Drug-disease associations
- → Intersect to find shared targets/pathways
- 1.8 General Academic Topics (No Bio Tools)
- For CS, social science, humanities, or other non-bio topics:
- Skip all bio annotation tools (UniProt, InterPro, GTEx, etc.)
- Proceed directly to Phase 2 literature search
- Use domain-appropriate databases (ArXiv for CS/ML, DBLP for CS, OSF for social science)
- Collision detection still applies (search term ambiguity)
- 1.9 Interdisciplinary / Cross-Domain Queries
- For topics spanning multiple domains (e.g., "GNNs for drug discovery", "AlphaFold protein prediction"):
- Identify each domain component
- separately (e.g., CS method + biological application)
- Resolve bio entities
- using Phase 1.1-1.3 (targets, drugs, diseases)
- Search CS/general literature
- using ArXiv, DBLP, SemanticScholar in parallel
- Merge results
- — use both bio tools AND general academic tools in Phase 2
- Cross-reference
- — find papers that bridge both domains (typically computational biology venues)
- Phase 2: Literature Search
- Methodology stays internal. The report shows findings, not process.
- 2.1 Query Strategy
- Step 1: High-Precision Seeds
- (15-30 core papers)
- Domain-specific seed queries:
- Biomedical: "[TERM]"[Title] AND (mechanism OR function OR structure OR review)
- CS/ML: ti:"[TERM]" AND (architecture OR benchmark OR evaluation OR survey)
- General: "[TERM]" in title via OpenAlex/Crossref
- Use date/sort filters for recency or impact:
- PubMed:
- mindate
- ,
- maxdate
- ,
- sort="pub_date"
- SemanticScholar:
- year="2023-2024"
- ,
- sort="citationCount:desc"
- ArXiv:
- date_from
- ,
- sort_by="submittedDate"
- Step 2: Citation Network Expansion
- PubMed_get_cited_by → Forward citations (primary)
- EuropePMC_get_citations → Forward (fallback)
- PubMed_get_related → Related papers
- EuropePMC_get_references → Backward citations
- SemanticScholar_get_recommendations → AI-similar papers
- OpenCitations_get_citations → DOI-based citation data
- Step 3: Collision-Filtered Broader Queries
- "[TERM]" AND ([context1] OR [context2]) NOT [collision_term]
- 2.2 Literature Search Tools
- Biomedical
- :
- PubMed_search_articles
- ,
- PMC_search_papers
- ,
- EuropePMC_search_articles
- ,
- PubTator3_LiteratureSearch
- CS/ML
- :
- ArXiv_search_papers
- ,
- DBLP_search_publications
- ,
- SemanticScholar_search_papers
- General academic
- :
- openalex_literature_search
- ,
- Crossref_search_works
- ,
- CORE_search_papers
- ,
- DOAJ_search_articles
- Preprints
- :
- BioRxiv_get_preprint
- ,
- MedRxiv_get_preprint
- ,
- OSF_search_preprints
- ,
- BioRxiv_list_recent_preprints
- (For preprint keyword search:
- EuropePMC_search_articles(source='PPR')
- )
- Multi-source deep search
- :
- advanced_literature_search_agent
- (searches 12+ databases in parallel; requires Azure OpenAI key — if unavailable, replicate coverage by querying PubMed + ArXiv + SemanticScholar + OpenAlex individually)
- Citation impact
- :
- iCite_search_publications
- (search + RCR/APT metrics),
- iCite_get_publications
- (metrics by PMID),
- scite_get_tallies
- (supporting/contradicting counts)
- Note: iCite and scite are PubMed-only. For CS/ML papers, use
- SemanticScholar_get_paper
- for citation counts and influence scores.
- Author search
- PubMed "Author[Author]" , ArXiv "au:Name" , SemanticScholar/OpenAlex as query text 2.3 Full-Text Verification When abstracts lack critical details, use full-text snippet extraction. See FULLTEXT_STRATEGY.md for the three-tier strategy (Europe PMC auto-snippets → manual Semantic Scholar/ArXiv → manual download). 2.4 Tool Failure Handling Attempt 1 → fails → wait 2s → Attempt 2 → fails → wait 5s → Fallback tool Primary Fallback 1 Fallback 2 PubMed_get_cited_by EuropePMC_get_citations OpenCitations_get_citations PubMed_get_related SemanticScholar_get_recommendations SemanticScholar_search_papers GTEx_get_median_gene_expression HPA_get_rna_expression_by_source Document as unavailable Unpaywall_check_oa_status Europe PMC isOpenAccess OpenAlex is_oa 2.5 Open Access Handling With Unpaywall email: full OA check. Without: best-effort via Europe PMC, PMC, OpenAlex, DOAJ flags. Label: OA Status: Best-effort (Unpaywall not configured) Phase 3: Evidence Grading Grade every claim by evidence strength: Tier Label Description Bio Example CS/ML Example T1 ★★★ Mechanistic Direct experimental/formal evidence CRISPR KO + rescue, RCT Formal proof, controlled ablation with significance test T2 ★★☆ Functional Functional study showing role siRNA knockdown phenotype Benchmark on standard dataset with baselines T3 ★☆☆ Association Screen hit, correlation, observational High-throughput screen, GWAS Observational study, case study, anecdotal comparison T4 ☆☆☆ Mention Review, text-mined, peripheral Review article Survey paper, blog post, workshop abstract In report , label inline: Target X regulates pathway Y [★★★: PMID:12345678] through direct phosphorylation [★★☆: PMID:23456789]. Per theme , summarize evidence quality:
- Theme: Lysosomal Function (47 papers)
- **
- Evidence Quality
- **
-
- Strong (32 mechanistic, 11 functional, 4 association)
- Report Output
- Deliverables
- File
- Mode
- Always?
- [topic]_report.md
- Full Deep-Research
- Yes
- [topic]_factcheck_report.md
- Factoid
- Yes
- [topic]_bibliography.json
- All modes
- Yes
- [topic]_bibliography.csv
- All modes
- Yes
- methods_appendix.md
- Any (only if requested)
- No
- Report-First Progressive Update Pattern
- Create the report file immediately
- after Phase 0 with all 15 section headers (use template from
- REPORT_TEMPLATE.md
- ). Then:
- After Phase 1 (disambiguation): fill Sections 1-5
- After Phase 2 (literature search): fill Sections 6-12
- After evidence grading: fill Sections 13-14
- Last
-
- write Executive Summary and Section 15 (synthesizes everything)
- This ensures partial results are saved even if the process is interrupted.
- Report Template
- Use the 15-section template from
- REPORT_TEMPLATE.md
- . Key sections adapt by domain:
- Biological targets
-
- protein architecture, expression, GO terms, disease links, pathogen involvement
- Drugs
-
- chemical properties, targets/MOA, pharmacokinetics, indications, safety
- Diseases
-
- epidemiology, pathophysiology, associated genes, treatments
- General academic
-
- historical context, key theories, empirical evidence, applications
- See
- REPORT_TEMPLATE.md
- for full template, domain-specific adaptations, bibliography format, theme extraction protocol, and completeness checklist.
- Communication
- Brief progress updates
- (not search logs):
- "Resolving subject identifiers..."
- "Building core paper set..."
- "Expanding via citation network..."
- "Clustering themes and grading evidence..."
- DO NOT expose
-
- raw tool outputs, dedup counts, search round details, database-by-database results.
- For factoid queries
- ask (once) if user wants just the verified answer or a full report. Default to factoid mode. References TOOL_NAMES_REFERENCE.md — Complete list of 123 tools with parameters REPORT_TEMPLATE.md — Full report template, domain adaptations, bibliography format, theme extraction, completeness checklist FULLTEXT_STRATEGY.md — Three-tier full-text verification strategy WORKFLOW.md — Compact workflow cheat-sheet EXAMPLES.md — Worked examples (ATP6V1A, TRAG collision, sparse target, drug query)