Papers often span cs.AI + cs.LG + cs.CL
Sort by date for recent work
:
sortBy=submittedDate&sortOrder=descending
WebFetch
(
{
url
:
'http://export.arxiv.org/api/query?search_query=ti:transformer+AND+abs:attention&max_results=10&sortBy=relevance'
,
prompt
:
'Extract paper titles, authors, abstracts, and arXiv IDs'
,
}
)
;
Example 2: Find papers by researcher
:
WebFetch
(
{
url
:
'http://export.arxiv.org/api/query?search_query=au:Vaswani&max_results=15'
,
prompt
:
'List all papers by this author with titles and dates'
,
}
)
;
Example 3: Get recent ML papers
:
WebFetch
(
{
url
:
'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=20&sortBy=submittedDate&sortOrder=descending'
,
prompt
:
'Extract the 20 most recent machine learning papers with titles and abstracts'
,
}
)
;
Example 4: Semantic search with Exa
:
mcp__Exa__web_search_exa
(
{
query
:
'site:arxiv.org multimodal large language models vision 2024'
,
numResults
:
10
,
}
)
;
Example 5: Get specific paper details
:
WebFetch
(
{
url
:
'http://export.arxiv.org/api/query?id_list=1706.03762'
,
prompt
:
"Extract complete details for the 'Attention Is All You Need' paper"
,
}
)
;
Agent Integration
This skill is automatically assigned to:
researcher
- Academic research, literature review
scientific-research-expert
- Deep scientific analysis
developer
- Finding technical papers for implementation
Iron Laws
ALWAYS enforce max_results=20
— never allow unlimited or >20 result queries; context explosion from 100+ papers is a known failure mode that stalls agent pipelines.
NEVER fetch full paper PDFs during literature review
— extract metadata and abstracts only; full papers are 100KB+ each and will exhaust context budget in minutes.
ALWAYS use Exa for semantic discovery, WebFetch for precision retrieval
— Exa finds semantically related papers; WebFetch gets specific IDs or category feeds; use both in sequence, not interchangeably.
NEVER use broad queries without field prefixes
—
search_query=neural+networks
returns thousands of results; always scope with
ti:
,
au:
,
cat:
, or
abs:
prefixes to target the query.
ALWAYS cite arXiv IDs (e.g., 2301.07041) when referencing papers
— titles alone are ambiguous and change; IDs are stable, machine-readable, and enable instant retrieval.
Anti-Patterns
Anti-Pattern
Why It Fails
Correct Approach
Using
max_results=100
or no limit
Context explosion; 100 papers × 300 bytes = 30KB+ metadata
Always set
max_results=20
(hard limit)
Fetching full paper PDFs
Single paper can be 100KB+; kills context budget
Extract abstract + metadata only via API
Broad query without field prefix
Returns irrelevant results across all fields
Use
ti:
,
au:
,
cat:
, or
abs:
prefix
Using only WebFetch for discovery
Misses semantically related papers not matching exact terms
Use Exa for semantic discovery first
Citing paper titles instead of arXiv IDs
Titles can be ambiguous or duplicated
Always include the arXiv ID (e.g., 1706.03762)
Memory Protocol (MANDATORY)
Before starting:
cat
.claude/context/memory/learnings.md
After completing:
New pattern ->
.claude/context/memory/learnings.md
Issue found ->
.claude/context/memory/issues.md
Decision made ->
.claude/context/memory/decisions.md
ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.