Scrapling CLI Web scraping CLI with browser impersonation, anti-bot bypass, and CSS extraction. Prerequisites
Install with all extras (CLI needs click, fetchers need playwright/camoufox)
uv tool install 'scrapling[all]'
Install fetcher browser engines (one-time)
scrapling install Verify: scrapling --help Fetcher Selection Tier Command Engine Speed Stealth JS Use When HTTP extract get/post/put/delete httpx + TLS impersonation Fast Medium No Static pages, APIs, most sites Dynamic extract fetch Playwright (headless browser) Medium Low Yes JS-rendered SPAs, wait-for-element Stealthy extract stealthy-fetch Camoufox (patched Firefox) Slow High Yes Cloudflare, aggressive anti-bot Default to HTTP tier — only escalate when the page requires JS rendering or blocks HTTP requests. Output Format Determined by output file extension: Extension Output Best For .html Raw HTML Parsing, further processing .md HTML converted to Markdown Reading, LLM context .txt Text content only Clean text extraction Always use /tmp/scrapling-*.{md,txt,html} for output files. Read the file after extraction. Core Commands HTTP Tier: GET scrapling extract get URL OUTPUT_FILE [ OPTIONS ] Flag Purpose Example -s, --css-selector Extract matching elements only -s ".article-body" --impersonate Force specific browser --impersonate firefox -H, --headers Custom headers (repeatable) -H "Authorization: Bearer tok" --cookies Cookie string --cookies "session=abc123" --proxy Proxy URL --proxy "http://user:pass@host:port" -p, --params Query params (repeatable) -p "page=2" -p "limit=50" --timeout Seconds (default: 30) --timeout 60 --no-verify Skip SSL verification For self-signed certs --no-follow-redirects Don't follow redirects For redirect inspection --no-stealthy-headers Disable stealth headers For debugging Examples:
Basic page fetch as markdown
scrapling extract get "https://example.com" /tmp/scrapling-out.md
Extract only article content
scrapling extract get "https://news.site.com/article" /tmp/scrapling-out.txt -s "article"
Multiple CSS selectors
scrapling extract get "https://hn.com" /tmp/scrapling-out.txt -s ".titleline > a"
With auth header
scrapling extract get "https://api.example.com/data" /tmp/scrapling-out.txt -H "Authorization: Bearer TOKEN"
Impersonate Firefox
scrapling extract get "https://example.com" /tmp/scrapling-out.md --impersonate firefox
Random browser impersonation from list
scrapling extract get "https://example.com" /tmp/scrapling-out.md --impersonate "chrome,firefox,safari"
With proxy
scrapling extract get "https://example.com" /tmp/scrapling-out.md --proxy "http://proxy:8080" HTTP Tier: POST scrapling extract post URL OUTPUT_FILE [ OPTIONS ] Additional options over GET: Flag Purpose Example -d, --data Form data -d "param1=value1¶m2=value2" -j, --json JSON body -j '{"key": "value"}'
POST with form data
scrapling extract post "https://api.example.com/search" /tmp/scrapling-out.txt -d "q=test&page=1"
POST with JSON
scrapling extract post "https://api.example.com/query" /tmp/scrapling-out.txt -j '{"query": "test"}' PUT and DELETE share the same interface as POST and GET respectively. Dynamic Tier: fetch For JS-rendered pages. Launches headless Playwright browser. scrapling extract fetch URL OUTPUT_FILE [ OPTIONS ] Flag Purpose Default --headless/--no-headless Headless mode True --disable-resources Drop images/CSS/fonts for speed False --network-idle Wait for network idle False --timeout Milliseconds 30000 --wait Extra wait after load (ms) 0 -s, --css-selector CSS selector extraction — --wait-selector Wait for element before proceeding — --real-chrome Use installed Chrome instead of bundled False --proxy Proxy URL — -H, --extra-headers Extra headers (repeatable) —
Fetch JS-rendered SPA
scrapling extract fetch "https://spa-app.com" /tmp/scrapling-out.md
Wait for specific element to load
scrapling extract fetch "https://dashboard.com" /tmp/scrapling-out.md --wait-selector ".data-table"
Fast mode: skip images/CSS, wait for network idle
scrapling extract fetch "https://app.com" /tmp/scrapling-out.md --disable-resources --network-idle
Extra wait for slow-loading content
scrapling extract fetch "https://lazy-site.com" /tmp/scrapling-out.md --wait 5000 Stealthy Tier: stealthy-fetch Maximum anti-detection. Uses Camoufox (patched Firefox). scrapling extract stealthy-fetch URL OUTPUT_FILE [ OPTIONS ] Additional options over fetch : Flag Purpose Default --solve-cloudflare Solve Cloudflare challenges False --block-webrtc Block WebRTC (prevents IP leak) False --hide-canvas Add noise to canvas fingerprinting False --block-webgl Block WebGL fingerprinting False (allowed)
Bypass Cloudflare
scrapling extract stealthy-fetch "https://cf-protected.com" /tmp/scrapling-out.md --solve-cloudflare
Maximum stealth
scrapling extract stealthy-fetch "https://aggressive-antibot.com" /tmp/scrapling-out.md \ --solve-cloudflare --block-webrtc --hide-canvas --block-webgl
Stealthy with CSS selector
scrapling extract stealthy-fetch "https://protected.com" /tmp/scrapling-out.txt \ --solve-cloudflare -s ".content" Auto-Escalation Protocol ALL scrapling usage must follow this protocol. Never use extract get alone — always validate content and escalate if needed. Consumer skills (res-deep, res-price-compare, doc-daily-digest) MUST use this pattern, not a bare extract get . Step 1: HTTP Tier scrapling extract get "URL" /tmp/scrapling-out.md Read /tmp/scrapling-out.md and validate content before proceeding. Step 2: Validate Content Check the scraped output for thin content indicators — signs that the site requires JS rendering: Indicator Pattern Example JS disabled warning "JavaScript", "enable JavaScript", "JS wyłączony" iSpot.pl, many SPAs No product/price data Output has navigation and footer but no prices, specs, or product names E-commerce SPAs Mostly nav links 80%+ of content is menu items, category links, cookie banners React/Angular/Vue apps Very short content Less than ~20 meaningful lines after stripping nav/footer Hydration-dependent pages Login/loading wall "Loading...", "Please wait", skeleton UI text Dashboard apps If ANY indicator is present → escalate to Dynamic tier. Do NOT treat HTTP 200 with thin content as success. Step 3: Dynamic Tier (if content validation fails) scrapling extract fetch "URL" /tmp/scrapling-out.md --network-idle --disable-resources Read and validate again. If content is now rich → done. If still blocked (403, Cloudflare challenge, empty) → escalate. Step 4: Stealthy Tier (if Dynamic tier fails) scrapling extract stealthy-fetch "URL" /tmp/scrapling-out.md --solve-cloudflare If still blocked, add maximum stealth flags: scrapling extract stealthy-fetch "URL" /tmp/scrapling-out.md \ --solve-cloudflare --block-webrtc --hide-canvas --block-webgl Consumer Skill Integration When a consumer skill says "retry with scrapling" or "scrapling fallback", it means: follow the full auto-escalation protocol above , not just the HTTP tier. The pattern: extract get → Read → Validate content Content thin? → extract fetch --network-idle --disable-resources → Read → Validate Still blocked? → extract stealthy-fetch --solve-cloudflare → Read All tiers fail? → Skip and label "scrapling blocked" Known JS-rendered sites (always start at Dynamic tier): iSpot.pl — React SPA, HTTP tier returns only nav shell Single-page apps with client-side routing (hash or history API URLs) Interactive Shell
Launch REPL
scrapling shell
One-liner evaluation
scrapling shell -c 'Fetcher().get("https://example.com").css("title::text")' Troubleshooting Issue Fix ModuleNotFoundError: click Reinstall: uv tool install --force 'scrapling[all]' fetch/stealthy-fetch fails Run scrapling install to install browser engines Cloudflare still blocks Add --block-webrtc --hide-canvas to stealthy-fetch Timeout Increase --timeout (seconds for HTTP, milliseconds for fetch/stealthy) SSL error Add --no-verify (HTTP tier only) Empty output with selector Try without -s first to verify page loads, then refine selector Constraints Output file path is required — scrapling writes to file, not stdout CSS selectors return ALL matches concatenated HTTP tier timeout is in seconds , fetch/stealthy-fetch timeout is in milliseconds --impersonate only available on HTTP tier (fetch/stealthy handle it internally) --solve-cloudflare only on stealthy-fetch tier Stealth headers enabled by default on HTTP tier — disable with --no-stealthy-headers for debugging