tooluniverse-literature-deep-research

安装量: 203
排名: #4253

安装

npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-literature-deep-research
Literature Deep Research
Systematic approach to comprehensive literature research: disambiguate the subject, search with collision-aware queries, grade evidence, and produce a structured report.
KEY PRINCIPLES
:
Disambiguate first
- Resolve IDs, synonyms, naming collisions before literature search
Right-size the deliverable
- Factoid mode for single questions; full report for deep research
Evidence grading
- Grade every claim (T1 mechanistic → T4 mention)
Mandatory completeness
- All sections must exist, even if "unknown/limited evidence"
Source attribution
- Every claim traceable to database/tool
English-first queries
- Use English for searches; respond in user's language
Report = deliverable
- Show findings, not search process
Workflow Overview
User Query
Phase 0: CLARIFY + MODE SELECT (factoid vs deep report)
Phase 1: SUBJECT DISAMBIGUATION + PROFILE
├─ Detect domain (biological target / drug / disease / general academic)
├─ Resolve identifiers and gather synonyms/aliases
├─ Check for naming collisions
└─ Gather baseline context via annotation tools (domain-specific)
Phase 2: LITERATURE SEARCH (methodology kept internal)
├─ High-precision seed queries
├─ Citation network expansion from seeds
├─ Collision-filtered broader queries
└─ Theme clustering + evidence grading
Phase 3: REPORT SYNTHESIS (report-first pattern)
├─ Create [topic]_report.md with all section headers IMMEDIATELY
├─ Progressively fill sections as data arrives (update after each phase)
├─ Write Executive Summary LAST (after all sections complete)
├─ Generate [topic]_bibliography.json + .csv
└─ Validate completeness checklist
Phase 0: Initial Clarification
Ask only what is needed; skip questions with obvious answers:
Subject type
Gene/protein, disease, drug, CS/ML topic, social science, or general?
Scope
Single factoid to verify, or comprehensive deep review?
Known aliases
(if ambiguous): Specific names or symbols in use?
Constraints
Open access only? Include preprints? Specific organisms or date range?
Mode Selection
Mode
When to Use
Deliverable
Factoid / Verification
Single concrete question
[topic]_factcheck_report.md
(≤1 page) + bibliography
Mini-review
Narrow topic
Short narrative report (1-3 pages)
Full Deep-Research
Comprehensive overview
Full 15-section report + bibliography
Heuristic
"Which antibiotic was X evolved to resist?" → Factoid. "What does the literature say about X?" → Full. Factoid / Verification Mode (Fast Path) Provide a correct, source-verified answer with explicit evidence attribution.

[TOPIC]: Fact-check Report * Generated: [Date] *

Question [User question]

Answer ** [One-sentence answer] ** [Evidence: ★★★/★★☆/★☆☆/☆☆☆]

Source(s)

[Primary citation: journal/year/PMID/DOI]

Verification Notes

[1-3 bullets: where the statement appears, key constraints]

Limitations

[Full text availability, evidence type caveats]
Prefer ToolUniverse literature tools over web browsing. Use
EuropePMC_search_articles(extract_terms_from_fulltext=[...])
for OA snippet verification when possible.
Detect Subject Domain
Query Pattern
Domain
Phase 1 Action
Gene symbol (EGFR, TP53)
Biological target
Full bio disambiguation
Protein name ("V-ATPase")
Biological target
Full bio disambiguation
Drug name ("metformin")
Drug
Drug disambiguation (see 1.5)
Disease ("Alzheimer's")
Disease
Disease disambiguation (see 1.6)
CS/ML topic ("transformer architecture")
General academic
Literature-only (skip bio tools)
Method, concept, general topic
General academic
Literature-only (skip bio tools)
Cross-domain ("GNNs for drug discovery")
Interdisciplinary
Resolve each entity in its domain (see 1.9)
Cross-Skill Delegation
For deep entity-specific research beyond literature, delegate to specialized skills:
Gene/protein deep-dive
(9-path profiling, druggability, GPCR data): use
tooluniverse-target-research
Drug comprehensive profile
(ADMET, FDA labels, formulations): use
tooluniverse-drug-research
Disease comprehensive profile
(ontologies, epidemiology, treatments): use
tooluniverse-disease-research
Use this skill when the focus is
literature synthesis and evidence grading
. Use specialized skills when the focus is
entity profiling with structured database queries
. For maximum depth, run both in parallel.
Phase 1: Subject Disambiguation + Profile
1.1 Resolve Official Identifiers (Biological Targets)
UniProt_search → UniProt accession
UniProt_get_entry_by_accession → Full entry with cross-references
UniProt_id_mapping → Map between ID types
ensembl_lookup_gene → Ensembl gene ID, biotype
MyGene_get_gene_annotation → NCBI Gene ID, aliases, summary
1.2 Naming Collision Detection
Check the primary database for the domain (first 20 results). If >20% off-topic, build a negative filter:
Domain
Collision Check Syntax
Biomedical
PubMed:
"[TERM]"[Title]
CS/ML
ArXiv:
ti:"[TERM]"
or SemanticScholar with
fieldsOfStudy
filter
General
OpenAlex or Crossref title search
Identify collision terms from off-topic results
Build negative filter:
NOT [collision1] NOT [collision2]
Gene family disambiguation
Use official symbol with explicit exclusions.
Example:
"ADAR" NOT "ADAR2" NOT "ADARB1"
for ADAR1-specific results.
Cross-domain collision
Some terms have different meanings across fields (e.g., "RAG" = Retrieval-Augmented Generation in CS, Recombination Activating Gene in biology). Add domain context terms to filter:
"RAG" AND "language model" NOT "recombination activating"
.
1.3 Baseline Profile (Biological Targets)
Gather structural, functional, and expression context via annotation tools:
InterPro_get_protein_domains → Domain architecture
UniProt_get_ptm_processing_by_accession → PTMs, active sites
HPA_get_subcellular_location → Localization
GTEx_get_median_gene_expression → Tissue expression (use gtex_v8)
GO_get_annotations_for_gene → GO terms
Reactome_map_uniprot_to_pathways → Pathways
STRING_get_protein_interactions → Interaction partners
intact_get_interactions → Experimentally validated PPIs
OpenTargets_get_target_tractability_by_ensemblID → Druggability assessment
GPCR targets
If the target is a GPCR (~35% of approved drug targets), delegate to tooluniverse-target-research for specialized GPCRdb data (3D structures, ligands, mutations). 1.4 Baseline Profile Output

Target Identity | Identifier | Value | Source | |


|

|

|
|
Official Symbol
|
[SYMBOL]
|
HGNC
|
|
UniProt
|
[ACC]
|
UniProt
|
|
Ensembl Gene
|
[ENSG...]
|
Ensembl
|
**
Synonyms
**
[list]
**
Collisions
**
[assessment]
1.5 Drug-Centric Disambiguation
Skip protein architecture/expression/GO. Instead:
Resolve identity
:
OpenTargets_get_drug_chembId_by_generic_name
,
ChEMBL_get_drug
,
PubChem_get_CID_by_compound_name
,
drugbank_get_drug_basic_info_by_drug_name_or_id
Targets & mechanisms
:
ChEMBL_get_drug_mechanisms
,
OpenTargets_get_associated_targets_by_drug_chemblId
,
DGIdb_get_drug_gene_interactions
,
drugbank_get_targets_by_drug_name_or_drugbank_id
Safety & indications
:
OpenTargets_get_drug_adverse_events_by_chemblId
,
OpenTargets_get_drug_indications_by_chemblId
,
search_clinical_trials
1.6 Disease-Centric Disambiguation
Resolve ontology IDs
Use
OpenTargets_get_drug_chembId_by_generic_name
or disease search tools to resolve EFO/MONDO IDs. Cross-reference ICD-10 and UMLS CUI when available from tool results.
OpenTargets_get_diseases_phenotypes_by_target_ensembl → Disease associations
DisGeNET_get_disease_genes → Disease-gene associations
DisGeNET_search_disease → Disease search with ontology IDs
CTD_get_disease_chemicals → Chemical-disease links
1.7 Compound Queries (e.g., "metformin in breast cancer")
Resolve both entities separately, then cross-reference:
CTD_get_chemical_gene_interactions → Chemical-gene links
CTD_get_chemical_diseases → Chemical-disease associations
OpenTargets_get_associated_targets_by_drug_chemblId → Drug targets
OpenTargets_get_associated_diseases_by_drug_chemblId → Drug-disease associations
→ Intersect to find shared targets/pathways
1.8 General Academic Topics (No Bio Tools)
For CS, social science, humanities, or other non-bio topics:
Skip all bio annotation tools (UniProt, InterPro, GTEx, etc.)
Proceed directly to Phase 2 literature search
Use domain-appropriate databases (ArXiv for CS/ML, DBLP for CS, OSF for social science)
Collision detection still applies (search term ambiguity)
1.9 Interdisciplinary / Cross-Domain Queries
For topics spanning multiple domains (e.g., "GNNs for drug discovery", "AlphaFold protein prediction"):
Identify each domain component
separately (e.g., CS method + biological application)
Resolve bio entities
using Phase 1.1-1.3 (targets, drugs, diseases)
Search CS/general literature
using ArXiv, DBLP, SemanticScholar in parallel
Merge results
— use both bio tools AND general academic tools in Phase 2
Cross-reference
— find papers that bridge both domains (typically computational biology venues)
Phase 2: Literature Search
Methodology stays internal. The report shows findings, not process.
2.1 Query Strategy
Step 1: High-Precision Seeds
(15-30 core papers)
Domain-specific seed queries:
Biomedical: "[TERM]"[Title] AND (mechanism OR function OR structure OR review)
CS/ML: ti:"[TERM]" AND (architecture OR benchmark OR evaluation OR survey)
General: "[TERM]" in title via OpenAlex/Crossref
Use date/sort filters for recency or impact:
PubMed:
mindate
,
maxdate
,
sort="pub_date"
SemanticScholar:
year="2023-2024"
,
sort="citationCount:desc"
ArXiv:
date_from
,
sort_by="submittedDate"
Step 2: Citation Network Expansion
PubMed_get_cited_by → Forward citations (primary)
EuropePMC_get_citations → Forward (fallback)
PubMed_get_related → Related papers
EuropePMC_get_references → Backward citations
SemanticScholar_get_recommendations → AI-similar papers
OpenCitations_get_citations → DOI-based citation data
Step 3: Collision-Filtered Broader Queries
"[TERM]" AND ([context1] OR [context2]) NOT [collision_term]
2.2 Literature Search Tools
Biomedical
:
PubMed_search_articles
,
PMC_search_papers
,
EuropePMC_search_articles
,
PubTator3_LiteratureSearch
CS/ML
:
ArXiv_search_papers
,
DBLP_search_publications
,
SemanticScholar_search_papers
General academic
:
openalex_literature_search
,
Crossref_search_works
,
CORE_search_papers
,
DOAJ_search_articles
Preprints
:
BioRxiv_get_preprint
,
MedRxiv_get_preprint
,
OSF_search_preprints
,
BioRxiv_list_recent_preprints
(For preprint keyword search:
EuropePMC_search_articles(source='PPR')
)
Multi-source deep search
:
advanced_literature_search_agent
(searches 12+ databases in parallel; requires Azure OpenAI key — if unavailable, replicate coverage by querying PubMed + ArXiv + SemanticScholar + OpenAlex individually)
Citation impact
:
iCite_search_publications
(search + RCR/APT metrics),
iCite_get_publications
(metrics by PMID),
scite_get_tallies
(supporting/contradicting counts)
Note: iCite and scite are PubMed-only. For CS/ML papers, use
SemanticScholar_get_paper
for citation counts and influence scores.
Author search
PubMed "Author[Author]" , ArXiv "au:Name" , SemanticScholar/OpenAlex as query text 2.3 Full-Text Verification When abstracts lack critical details, use full-text snippet extraction. See FULLTEXT_STRATEGY.md for the three-tier strategy (Europe PMC auto-snippets → manual Semantic Scholar/ArXiv → manual download). 2.4 Tool Failure Handling Attempt 1 → fails → wait 2s → Attempt 2 → fails → wait 5s → Fallback tool Primary Fallback 1 Fallback 2 PubMed_get_cited_by EuropePMC_get_citations OpenCitations_get_citations PubMed_get_related SemanticScholar_get_recommendations SemanticScholar_search_papers GTEx_get_median_gene_expression HPA_get_rna_expression_by_source Document as unavailable Unpaywall_check_oa_status Europe PMC isOpenAccess OpenAlex is_oa 2.5 Open Access Handling With Unpaywall email: full OA check. Without: best-effort via Europe PMC, PMC, OpenAlex, DOAJ flags. Label: OA Status: Best-effort (Unpaywall not configured) Phase 3: Evidence Grading Grade every claim by evidence strength: Tier Label Description Bio Example CS/ML Example T1 ★★★ Mechanistic Direct experimental/formal evidence CRISPR KO + rescue, RCT Formal proof, controlled ablation with significance test T2 ★★☆ Functional Functional study showing role siRNA knockdown phenotype Benchmark on standard dataset with baselines T3 ★☆☆ Association Screen hit, correlation, observational High-throughput screen, GWAS Observational study, case study, anecdotal comparison T4 ☆☆☆ Mention Review, text-mined, peripheral Review article Survey paper, blog post, workshop abstract In report , label inline: Target X regulates pathway Y [★★★: PMID:12345678] through direct phosphorylation [★★☆: PMID:23456789]. Per theme , summarize evidence quality:

Theme: Lysosomal Function (47 papers)
**
Evidence Quality
**
Strong (32 mechanistic, 11 functional, 4 association)
Report Output
Deliverables
File
Mode
Always?
[topic]_report.md
Full Deep-Research
Yes
[topic]_factcheck_report.md
Factoid
Yes
[topic]_bibliography.json
All modes
Yes
[topic]_bibliography.csv
All modes
Yes
methods_appendix.md
Any (only if requested)
No
Report-First Progressive Update Pattern
Create the report file immediately
after Phase 0 with all 15 section headers (use template from
REPORT_TEMPLATE.md
). Then:
After Phase 1 (disambiguation): fill Sections 1-5
After Phase 2 (literature search): fill Sections 6-12
After evidence grading: fill Sections 13-14
Last
write Executive Summary and Section 15 (synthesizes everything)
This ensures partial results are saved even if the process is interrupted.
Report Template
Use the 15-section template from
REPORT_TEMPLATE.md
. Key sections adapt by domain:
Biological targets
protein architecture, expression, GO terms, disease links, pathogen involvement
Drugs
chemical properties, targets/MOA, pharmacokinetics, indications, safety
Diseases
epidemiology, pathophysiology, associated genes, treatments
General academic
historical context, key theories, empirical evidence, applications
See
REPORT_TEMPLATE.md
for full template, domain-specific adaptations, bibliography format, theme extraction protocol, and completeness checklist.
Communication
Brief progress updates
(not search logs):
"Resolving subject identifiers..."
"Building core paper set..."
"Expanding via citation network..."
"Clustering themes and grading evidence..."
DO NOT expose
raw tool outputs, dedup counts, search round details, database-by-database results.
For factoid queries
ask (once) if user wants just the verified answer or a full report. Default to factoid mode. References TOOL_NAMES_REFERENCE.md — Complete list of 123 tools with parameters REPORT_TEMPLATE.md — Full report template, domain adaptations, bibliography format, theme extraction, completeness checklist FULLTEXT_STRATEGY.md — Three-tier full-text verification strategy WORKFLOW.md — Compact workflow cheat-sheet EXAMPLES.md — Worked examples (ATP6V1A, TRAG collision, sparse target, drug query)
返回排行榜