xcrawl

安装量: 233
排名: #8881

安装

npx skills add https://github.com/xcrawl-api/xcrawl-skills --skill xcrawl

XCrawl Overview This skill is the default XCrawl entry point when the user asks for XCrawl directly without naming a specific API or sub-skill. It currently targets single-page extraction through XCrawl Scrape APIs. Default behavior is raw passthrough: return upstream API response bodies as-is. Routing Guidance If the user wants to extract one or more specific URLs, use this skill and default to XCrawl Scrape. If the user wants site URL discovery, prefer XCrawl Map APIs. If the user wants multi-page or site-wide crawling, prefer XCrawl Crawl APIs. If the user wants keyword-based discovery, prefer XCrawl Search APIs. Required Local Config Before using this skill, the user must create a local config file and write XCRAWL_API_KEY into it. Path: ~/.xcrawl/config.json { "XCRAWL_API_KEY" : "" } Read API key from local config file only. Do not require global environment variables. Credits and Account Setup Using XCrawl APIs consumes credits. If the user does not have an account or available credits, guide them to register at https://dash.xcrawl.com/ . After registration, they can activate the free 1000 credits plan before running requests. Tool Permission Policy Request runtime permissions for curl and node only. Do not request Python, shell helper scripts, or other runtime permissions. API Surface Start scrape: POST /v1/scrape Read async result: GET /v1/scrape/{scrape_id} Base URL: https://run.xcrawl.com Required header: Authorization: Bearer Usage Examples cURL (sync) API_KEY = " $( node -e "const fs = require ( 'fs' ) ; const p = process.env. HOME + '/.xcrawl/config.json' ; const k = JSON.parse ( fs.readFileSync ( p, 'utf8' ) ) .XCRAWL_API_KEY||'';process.stdout.write(k)" ) " curl -sS -X POST " https://run.xcrawl.com/v1/scrape " \ -H " Content-Type: application/json " \ -H " Authorization: Bearer ${API_KEY} " \ -d '{" url ":" https://example.com "," mode ":" sync "," output ":{" formats ":[" markdown "," links" ] } } ' cURL (async create + result) API_KEY = " $( node -e "const fs = require ( 'fs' ) ; const p = process.env. HOME + '/.xcrawl/config.json' ; const k = JSON.parse ( fs.readFileSync ( p, 'utf8' ) ) .XCRAWL_API_KEY||'';process.stdout.write(k)" ) " CREATE_RESP=" $( curl -sS -X POST "https://run.xcrawl.com/v1/scrape" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${API_KEY} " \ -d '{"url":"https://example.com/product/1","mode":"async","output":{"formats":["json"]},"json":{"prompt":"Extract title and price."}}' ) " echo " $CREATE_RESP " SCRAPE_ID=" $( node -e 'const s=process.argv[1];const j=JSON.parse(s);process.stdout.write(j.scrape_id||"")' " $CREATE_RESP " ) " curl -sS -X GET " https://run.xcrawl.com/v1/scrape/ ${SCRAPE_ID} " \ -H " Authorization: Bearer ${API_KEY} " Node node -e ' const fs=require("fs"); const apiKey=JSON.parse(fs.readFileSync(process.env.HOME+"/.xcrawl/config.json","utf8")).XCRAWL_API_KEY; const body={url:"https://example.com",mode:"sync",output:{formats:["markdown","json"]},json:{prompt:"Extract title and publish date."}}; fetch("https://run.xcrawl.com/v1/scrape",{ method:"POST", headers:{"Content-Type":"application/json",Authorization:Bearer ${apiKey}}, body:JSON.stringify(body) }).then(async r=>{console.log(await r.text());}); ' Request Parameters Request endpoint and headers Endpoint: POST https://run.xcrawl.com/v1/scrape Headers: Content-Type: application/json Authorization: Bearer Request body: top-level fields Field Type Required Default Description url string Yes - Target URL mode string No sync sync or async proxy object No - Proxy config request object No - Request config js_render object No - JS rendering config output object No - Output config webhook object No - Async webhook config ( mode=async ) proxy Field Type Required Default Description location string No US ISO-3166-1 alpha-2 country code, e.g. US / JP / SG sticky_session string No Auto-generated Sticky session ID; same ID attempts to reuse exit request Field Type Required Default Description locale string No en-US,en;q=0.9 Affects Accept-Language device string No desktop desktop / mobile ; affects UA and viewport cookies object map No - Cookie key/value pairs headers object map No - Header key/value pairs only_main_content boolean No true Return main content only block_ads boolean No true Attempt to block ad resources skip_tls_verification boolean No true Skip TLS verification js_render Field Type Required Default Description enabled boolean No true Enable browser rendering wait_until string No load load / domcontentloaded / networkidle viewport.width integer No - Viewport width (desktop 1920 , mobile 402 ) viewport.height integer No - Viewport height (desktop 1080 , mobile 874 ) output Field Type Required Default Description formats string[] No ["markdown"] Output formats screenshot string No viewport full_page / viewport (only if formats includes screenshot ) json.prompt string No - Extraction prompt json.json_schema object No - JSON Schema output.formats enum: html raw_html markdown links summary screenshot json webhook Field Type Required Default Description url string No - Callback URL headers object map No - Custom callback headers events string[] No ["started","completed","failed"] Events: started / completed / failed Response Parameters Sync create response ( mode=sync ) Field Type Description scrape_id string Task ID endpoint string Always scrape version string Version status string completed / failed url string Target URL data object Result data started_at string Start time (ISO 8601) ended_at string End time (ISO 8601) total_credits_used integer Total credits used data fields (based on output.formats ): html , raw_html , markdown , links , summary , screenshot , json metadata (page metadata) traffic_bytes credits_used credits_detail credits_detail fields: Field Type Description base_cost integer Base scrape cost traffic_cost integer Traffic cost json_extract_cost integer JSON extraction cost Async create response ( mode=async ) Field Type Description scrape_id string Task ID endpoint string Always scrape version string Version status string Always pending Async result response ( GET /v1/scrape/{scrape_id} ) Field Type Description scrape_id string Task ID endpoint string Always scrape version string Version status string pending / crawling / completed / failed url string Target URL data object Same shape as sync data started_at string Start time (ISO 8601) ended_at string End time (ISO 8601) Workflow Classify the request through the default XCrawl entry behavior. If the user provides specific URLs for extraction, default to XCrawl Scrape. If the user clearly asks for map, crawl, or search behavior, route to the dedicated XCrawl API instead of pretending this endpoint covers it. Restate the user goal as an extraction contract. URL scope, required fields, accepted nulls, and precision expectations. Build the scrape request body. Keep only necessary options. Prefer explicit output.formats . Execute scrape and capture task metadata. Track scrape_id , status , and timestamps. If async, poll until completed or failed . Return raw API responses directly. Do not synthesize or compress fields by default. Output Contract Return: Endpoint(s) used and mode ( sync or async ) request_payload used for the request Raw response body from each API call Error details when request fails Do not generate summaries unless the user explicitly requests a summary. Guardrails Do not present XCrawl Scrape as if it also covers map, crawl, or search semantics. Default to scrape only when user intent is URL extraction. Do not invent unsupported output fields. Do not hardcode provider-specific tool schemas in core logic. Call out uncertainty when page structure is unstable.

返回排行榜