Web Content Fetching Fetch web content in this order: Prefer markdown-native endpoints ( content-type: text/markdown ) Use selector-based HTML extraction for known sites Use the bundled Bun fallback script when selectors fail Prerequisites Verify required tools before extracting: command -v curl
/dev/null || echo "curl is required" command -v html2markdown
/dev/null || echo "html2markdown is required for HTML extraction" command -v bun
/dev/null || echo "bun is required for fetch.ts fallback" Install Bun dependencies for the bundled script: cd ~/.claude/skills/web-fetch && bun install Default Workflow Use this as the default flow for any URL: URL = "
" CONTENT_TYPE = "$(curl -sIL " $URL " | awk -F': ' 'tolower( $1 )==" content-type "{print tolower( $2 )}' | tr -d ' \r ' | tail -1)" if echo " $CONTENT_TYPE " | grep -q "markdown" ; then curl -sL " $URL " else curl -sL " $URL " \ | html2markdown \ --include-selector "article,main,[role=main]" \ --exclude-selector "nav,header,footer,script,style" fi Known Site Selectors Site Include Selector Exclude Selector platform.claude.com
content-container
- docs.anthropic.com
content-container
-
developer.mozilla.org
article
-
github.com (docs)
article
nav,.sidebar
Generic
article,main,[role=main]
nav,header,footer,script,style
Example:
curl
-sL
"
Check what content containers exist
curl
-s
"
Test a selector
curl
-sL
"
Check line count
curl
-sL
"
Keep only matching elements
--exclude-selector "CSS"
Remove matching elements
--domain "https://..."
Convert relative links to absolute
- Troubleshooting
- Empty output with selectors
-
- The page might be markdown-native. Check headers first:
- curl
- -sIL
- "
" - |
- grep
- -i
- '^content-type:'
- Wrong content selected
-
- The site may have multiple article/main regions:
- curl
- -s
- "
" - |
- grep
- -o
- '
]*>' - html2markdown
- not found
-
- Install it, then retry selector-based extraction.
- bun
- or script deps missing
-
- Run
- cd ~/.claude/skills/web-fetch && bun install
- .
- Missing code blocks
-
- Check if the site uses non-standard code formatting.
- Client-rendered content
- If HTML only has "Loading..." placeholders, the content is JS-rendered. Neither curl nor the Bun script can extract it; use browser-based tools.