Literature Deep Research

Systematic approach to comprehensive literature research: disambiguate the subject, search with collision-aware queries, grade evidence, and produce a structured report.

KEY PRINCIPLES

:

Disambiguate first

- Resolve IDs, synonyms, naming collisions before literature search

Right-size the deliverable

- Factoid mode for single questions; full report for deep research

Evidence grading

- Grade every claim (T1 mechanistic → T4 mention)

Mandatory completeness

- All sections must exist, even if "unknown/limited evidence"

Source attribution

- Every claim traceable to database/tool

English-first queries

- Use English for searches; respond in user's language

Report = deliverable

- Show findings, not search process

Workflow Overview

User Query

↓

Phase 0: CLARIFY + MODE SELECT (factoid vs deep report)

↓

Phase 1: SUBJECT DISAMBIGUATION + PROFILE

├─ Detect domain (biological target / drug / disease / general academic)

├─ Resolve identifiers and gather synonyms/aliases

├─ Check for naming collisions

└─ Gather baseline context via annotation tools (domain-specific)

↓

Phase 2: LITERATURE SEARCH (methodology kept internal)

├─ High-precision seed queries

├─ Citation network expansion from seeds

├─ Collision-filtered broader queries

└─ Theme clustering + evidence grading

↓

Phase 3: REPORT SYNTHESIS (report-first pattern)

├─ Create [topic]_report.md with all section headers IMMEDIATELY

├─ Progressively fill sections as data arrives (update after each phase)

├─ Write Executive Summary LAST (after all sections complete)

├─ Generate [topic]_bibliography.json + .csv

└─ Validate completeness checklist

Phase 0: Initial Clarification

Ask only what is needed; skip questions with obvious answers:

Subject type

Gene/protein, disease, drug, CS/ML topic, social science, or general?

Scope

Single factoid to verify, or comprehensive deep review?

Known aliases

(if ambiguous): Specific names or symbols in use?

Constraints

Open access only? Include preprints? Specific organisms or date range?
Mode Selection
Mode
When to Use
Deliverable
Factoid / Verification
Single concrete question
[topic]_factcheck_report.md
(≤1 page) + bibliography
Mini-review
Narrow topic
Short narrative report (1-3 pages)
Full Deep-Research
Comprehensive overview
Full 15-section report + bibliography
Heuristic: "Which antibiotic was X evolved to resist?" → Factoid. "What does the literature say about X?" → Full. Factoid / Verification Mode (Fast Path) Provide a correct, source-verified answer with explicit evidence attribution.

[TOPIC]: Fact-check Report * Generated: [Date] *

Question [User question]

Answer ** [One-sentence answer] ** [Evidence: ★★★/★★☆/★☆☆/☆☆☆]

Source(s)

[Primary citation: journal/year/PMID/DOI]

Verification Notes

[1-3 bullets: where the statement appears, key constraints]

Limitations

[Full text availability, evidence type caveats]

Prefer ToolUniverse literature tools over web browsing. Use

EuropePMC_search_articles(extract_terms_from_fulltext=[...])

for OA snippet verification when possible.

Detect Subject Domain

Query Pattern

Domain

Phase 1 Action

Gene symbol (EGFR, TP53)

Biological target

Full bio disambiguation

Protein name ("V-ATPase")

Biological target

Full bio disambiguation

Drug name ("metformin")

Drug

Drug disambiguation (see 1.5)

Disease ("Alzheimer's")

Disease

Disease disambiguation (see 1.6)

CS/ML topic ("transformer architecture")

General academic

Literature-only (skip bio tools)

Method, concept, general topic

General academic

Literature-only (skip bio tools)

Cross-domain ("GNNs for drug discovery")

Interdisciplinary

Resolve each entity in its domain (see 1.9)

Cross-Skill Delegation

For deep entity-specific research beyond literature, delegate to specialized skills:

Gene/protein deep-dive

(9-path profiling, druggability, GPCR data): use

tooluniverse-target-research

Drug comprehensive profile

(ADMET, FDA labels, formulations): use

tooluniverse-drug-research

Disease comprehensive profile

(ontologies, epidemiology, treatments): use

tooluniverse-disease-research

Use this skill when the focus is

literature synthesis and evidence grading

. Use specialized skills when the focus is

entity profiling with structured database queries

. For maximum depth, run both in parallel.

Phase 1: Subject Disambiguation + Profile

1.1 Resolve Official Identifiers (Biological Targets)

UniProt_search → UniProt accession

UniProt_get_entry_by_accession → Full entry with cross-references

UniProt_id_mapping → Map between ID types

ensembl_lookup_gene → Ensembl gene ID, biotype

MyGene_get_gene_annotation → NCBI Gene ID, aliases, summary

1.2 Naming Collision Detection

Check the primary database for the domain (first 20 results). If >20% off-topic, build a negative filter:

Domain

Collision Check Syntax

Biomedical

PubMed:

"[TERM]"[Title]

CS/ML

ArXiv:

ti:"[TERM]"

or SemanticScholar with

fieldsOfStudy

filter

General

OpenAlex or Crossref title search

Identify collision terms from off-topic results

Build negative filter:

NOT [collision1] NOT [collision2]

Gene family disambiguation

Use official symbol with explicit exclusions.

Example:

"ADAR" NOT "ADAR2" NOT "ADARB1"

for ADAR1-specific results.

Cross-domain collision

Some terms have different meanings across fields (e.g., "RAG" = Retrieval-Augmented Generation in CS, Recombination Activating Gene in biology). Add domain context terms to filter:
"RAG" AND "language model" NOT "recombination activating"
.
1.3 Baseline Profile (Biological Targets)
Gather structural, functional, and expression context via annotation tools:
InterPro_get_protein_domains → Domain architecture
UniProt_get_ptm_processing_by_accession → PTMs, active sites
HPA_get_subcellular_location → Localization
GTEx_get_median_gene_expression → Tissue expression (use gtex_v8)
GO_get_annotations_for_gene → GO terms
Reactome_map_uniprot_to_pathways → Pathways
STRING_get_protein_interactions → Interaction partners
intact_get_interactions → Experimentally validated PPIs
OpenTargets_get_target_tractability_by_ensemblID → Druggability assessment
GPCR targets: If the target is a GPCR (~35% of approved drug targets), delegate to tooluniverse-target-research for specialized GPCRdb data (3D structures, ligands, mutations). 1.4 Baseline Profile Output

|

|

Official Symbol

|

[SYMBOL]

|

HGNC

|

UniProt

|

[ACC]

|

UniProt

|

Ensembl Gene

|

[ENSG...]

|

Ensembl

|

Synonyms

[list]

Collisions

[assessment]

1.5 Drug-Centric Disambiguation

Skip protein architecture/expression/GO. Instead:

Resolve identity

:

OpenTargets_get_drug_chembId_by_generic_name

,

ChEMBL_get_drug

,

PubChem_get_CID_by_compound_name

,

drugbank_get_drug_basic_info_by_drug_name_or_id

Targets & mechanisms

:

ChEMBL_get_drug_mechanisms

,

OpenTargets_get_associated_targets_by_drug_chemblId

,

DGIdb_get_drug_gene_interactions

,

drugbank_get_targets_by_drug_name_or_drugbank_id

Safety & indications

:

OpenTargets_get_drug_adverse_events_by_chemblId

,

OpenTargets_get_drug_indications_by_chemblId

,

search_clinical_trials

1.6 Disease-Centric Disambiguation

Resolve ontology IDs

Use
OpenTargets_get_drug_chembId_by_generic_name
or disease search tools to resolve EFO/MONDO IDs. Cross-reference ICD-10 and UMLS CUI when available from tool results.
OpenTargets_get_diseases_phenotypes_by_target_ensembl → Disease associations
DisGeNET_get_disease_genes → Disease-gene associations
DisGeNET_search_disease → Disease search with ontology IDs
CTD_get_disease_chemicals → Chemical-disease links
1.7 Compound Queries (e.g., "metformin in breast cancer")
Resolve both entities separately, then cross-reference:
CTD_get_chemical_gene_interactions → Chemical-gene links
CTD_get_chemical_diseases → Chemical-disease associations
OpenTargets_get_associated_targets_by_drug_chemblId → Drug targets
OpenTargets_get_associated_diseases_by_drug_chemblId → Drug-disease associations
→ Intersect to find shared targets/pathways
1.8 General Academic Topics (No Bio Tools)
For CS, social science, humanities, or other non-bio topics:
Skip all bio annotation tools (UniProt, InterPro, GTEx, etc.)
Proceed directly to Phase 2 literature search
Use domain-appropriate databases (ArXiv for CS/ML, DBLP for CS, OSF for social science)
Collision detection still applies (search term ambiguity)
1.9 Interdisciplinary / Cross-Domain Queries
For topics spanning multiple domains (e.g., "GNNs for drug discovery", "AlphaFold protein prediction"):
Identify each domain component
separately (e.g., CS method + biological application)
Resolve bio entities
using Phase 1.1-1.3 (targets, drugs, diseases)
Search CS/general literature
using ArXiv, DBLP, SemanticScholar in parallel
Merge results
— use both bio tools AND general academic tools in Phase 2
Cross-reference
— find papers that bridge both domains (typically computational biology venues)
Phase 2: Literature Search
Methodology stays internal. The report shows findings, not process.
2.1 Query Strategy
Step 1: High-Precision Seeds
(15-30 core papers)
Domain-specific seed queries:
Biomedical: "[TERM]"[Title] AND (mechanism OR function OR structure OR review)
CS/ML: ti:"[TERM]" AND (architecture OR benchmark OR evaluation OR survey)
General: "[TERM]" in title via OpenAlex/Crossref
Use date/sort filters for recency or impact:
PubMed:
mindate
,
maxdate
,
sort="pub_date"
SemanticScholar:
year="2023-2024"
,
sort="citationCount:desc"
ArXiv:
date_from
,
sort_by="submittedDate"
Step 2: Citation Network Expansion
PubMed_get_cited_by → Forward citations (primary)
EuropePMC_get_citations → Forward (fallback)
PubMed_get_related → Related papers
EuropePMC_get_references → Backward citations
SemanticScholar_get_recommendations → AI-similar papers
OpenCitations_get_citations → DOI-based citation data
Step 3: Collision-Filtered Broader Queries
"[TERM]" AND ([context1] OR [context2]) NOT [collision_term]
2.2 Literature Search Tools
Biomedical
:
PubMed_search_articles
,
PMC_search_papers
,
EuropePMC_search_articles
,
PubTator3_LiteratureSearch
CS/ML
:
ArXiv_search_papers
,
DBLP_search_publications
,
SemanticScholar_search_papers
General academic
:
openalex_literature_search
,
Crossref_search_works
,
CORE_search_papers
,
DOAJ_search_articles
Preprints
:
BioRxiv_get_preprint
,
MedRxiv_get_preprint
,
OSF_search_preprints
,
BioRxiv_list_recent_preprints
(For preprint keyword search:
EuropePMC_search_articles(source='PPR')
)
Multi-source deep search
:
advanced_literature_search_agent
(searches 12+ databases in parallel; requires Azure OpenAI key — if unavailable, replicate coverage by querying PubMed + ArXiv + SemanticScholar + OpenAlex individually)
Citation impact
:
iCite_search_publications
(search + RCR/APT metrics),
iCite_get_publications
(metrics by PMID),
scite_get_tallies
(supporting/contradicting counts)
Note: iCite and scite are PubMed-only. For CS/ML papers, use
SemanticScholar_get_paper
for citation counts and influence scores.
Author search: PubMed "Author[Author]" , ArXiv "au:Name" , SemanticScholar/OpenAlex as query text 2.3 Full-Text Verification When abstracts lack critical details, use full-text snippet extraction. See FULLTEXT_STRATEGY.md for the three-tier strategy (Europe PMC auto-snippets → manual Semantic Scholar/ArXiv → manual download). 2.4 Tool Failure Handling Attempt 1 → fails → wait 2s → Attempt 2 → fails → wait 5s → Fallback tool Primary Fallback 1 Fallback 2 PubMed_get_cited_by EuropePMC_get_citations OpenCitations_get_citations PubMed_get_related SemanticScholar_get_recommendations SemanticScholar_search_papers GTEx_get_median_gene_expression HPA_get_rna_expression_by_source Document as unavailable Unpaywall_check_oa_status Europe PMC isOpenAccess OpenAlex is_oa 2.5 Open Access Handling With Unpaywall email: full OA check. Without: best-effort via Europe PMC, PMC, OpenAlex, DOAJ flags. Label: OA Status: Best-effort (Unpaywall not configured) Phase 3: Evidence Grading Grade every claim by evidence strength: Tier Label Description Bio Example CS/ML Example T1 ★★★ Mechanistic Direct experimental/formal evidence CRISPR KO + rescue, RCT Formal proof, controlled ablation with significance test T2 ★★☆ Functional Functional study showing role siRNA knockdown phenotype Benchmark on standard dataset with baselines T3 ★☆☆ Association Screen hit, correlation, observational High-throughput screen, GWAS Observational study, case study, anecdotal comparison T4 ☆☆☆ Mention Review, text-mined, peripheral Review article Survey paper, blog post, workshop abstract In report , label inline: Target X regulates pathway Y [★★★: PMID:12345678] through direct phosphorylation [★★☆: PMID:23456789]. Per theme , summarize evidence quality:

Theme: Lysosomal Function (47 papers)

Evidence Quality

Strong (32 mechanistic, 11 functional, 4 association)

Report Output

Deliverables

File

Mode

Always?

[topic]_report.md

Full Deep-Research

Yes

[topic]_factcheck_report.md

Factoid

Yes

[topic]_bibliography.json

All modes

Yes

[topic]_bibliography.csv

All modes

Yes

methods_appendix.md

Any (only if requested)

No

Report-First Progressive Update Pattern

Create the report file immediately

after Phase 0 with all 15 section headers (use template from

REPORT_TEMPLATE.md

). Then:

After Phase 1 (disambiguation): fill Sections 1-5

After Phase 2 (literature search): fill Sections 6-12

After evidence grading: fill Sections 13-14

Last

write Executive Summary and Section 15 (synthesizes everything)

This ensures partial results are saved even if the process is interrupted.

Report Template

Use the 15-section template from

REPORT_TEMPLATE.md

. Key sections adapt by domain:

Biological targets

protein architecture, expression, GO terms, disease links, pathogen involvement

Drugs

chemical properties, targets/MOA, pharmacokinetics, indications, safety

Diseases

epidemiology, pathophysiology, associated genes, treatments

General academic

historical context, key theories, empirical evidence, applications

See

REPORT_TEMPLATE.md

for full template, domain-specific adaptations, bibliography format, theme extraction protocol, and completeness checklist.

Communication

Brief progress updates

(not search logs):

"Resolving subject identifiers..."

"Building core paper set..."

"Expanding via citation network..."

"Clustering themes and grading evidence..."

DO NOT expose

raw tool outputs, dedup counts, search round details, database-by-database results.
For factoid queries: ask (once) if user wants just the verified answer or a full report. Default to factoid mode. References TOOL_NAMES_REFERENCE.md — Complete list of 123 tools with parameters REPORT_TEMPLATE.md — Full report template, domain adaptations, bibliography format, theme extraction, completeness checklist FULLTEXT_STRATEGY.md — Three-tier full-text verification strategy WORKFLOW.md — Compact workflow cheat-sheet EXAMPLES.md — Worked examples (ATP6V1A, TRAG collision, sparse target, drug query)

tooluniverse-literature-deep-research

安装

Source(s)

Verification Notes

Limitations

|

|