安装
npx skills add https://github.com/vm0-ai/vm0-skills --skill bright-data
- Bright Data Web Scraper API
- Use the Bright Data API via direct
- curl
- calls for
- social media scraping
- ,
- web data extraction
- , and
- account management
- .
- Official docs:
- https://docs.brightdata.com/
- When to Use
- Use this skill when you need to:
- Scrape social media
- - Twitter/X, Reddit, YouTube, Instagram, TikTok, LinkedIn
- Extract web data
- - Posts, profiles, comments, engagement metrics
- Monitor usage
- - Track bandwidth and request usage
- Manage account
- - Check status and zones
- Prerequisites
- Sign up at
- Bright Data
- Get your API key from
- Settings > Users
- Create a Web Scraper dataset in the
- Control Panel
- to get your
- dataset_id
- export
- BRIGHTDATA_TOKEN
- =
- "your-api-key"
- Base URL
- https://api.brightdata.com
- Important:
- When using
- $VAR
- in a command that pipes to another command, wrap the command containing
- $VAR
- in
- bash -c '...'
- . Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.
- bash
- -c
- 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'
- Social Media Scraping
- Bright Data supports scraping these social media platforms:
- Platform
- Profiles
- Posts
- Comments
- Reels/Videos
- Twitter/X
- ✅
- ✅
- -
- -
- Reddit
- -
- ✅
- ✅
- -
- YouTube
- ✅
- ✅
- ✅
- -
- Instagram
- ✅
- ✅
- ✅
- ✅
- TikTok
- ✅
- ✅
- ✅
- -
- LinkedIn
- ✅
- ✅
- -
- -
- How to Use
- 1. Trigger Scraping (Asynchronous)
- Trigger a data collection job and get a
- snapshot_id
- for later retrieval.
- Write to
- /tmp/brightdata_request.json
- :
- [
- {
- "url"
- :
- "https://twitter.com/username"
- }
- ,
- {
- "url"
- :
- "https://twitter.com/username2"
- }
- ]
- Then run (replace
- with your actual dataset ID):
- bash
- -c
- 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=" \
- -H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
- -H "Content-Type: application/json" \
- -d @/tmp/brightdata_request.json'
- Response:
- {
- "snapshot_id"
- :
- "s_m4x7enmven8djfqak"
- }
- 2. Trigger Scraping (Synchronous)
- Get results immediately in the response (for small requests).
- Write to
- /tmp/brightdata_request.json
- :
- [
- {
- "url"
- :
- "https://www.reddit.com/r/technology/comments/xxxxx"
- }
- ]
- Then run (replace
- with your actual dataset ID):
- bash
- -c
- 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \
- -H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
- -H "Content-Type: application/json" \
- -d @/tmp/brightdata_request.json'
- 3. Monitor Progress
- Check the status of a scraping job (replace
- with your actual snapshot ID):
- bash
- -c
- 'curl -s "https://api.brightdata.com/datasets/v3/progress/" \
- -H "Authorization: Bearer ${BRIGHTDATA_TOKEN}"'
- Response:
- {
- "snapshot_id"
- :
- "s_m4x7enmven8djfqak"
- ,
- "dataset_id"
- :
- "gd_xxxxx"
- ,
- "status"
- :
- "running"
- }
- Status values:
- running
- ,
- ready
- ,
- failed
- 4. Download Results
- Once status is
- ready
- , download the collected data (replace
- with your actual snapshot ID):
- bash
- -c
- 'curl -s "https://api.brightdata.com/datasets/v3/snapshot/?format=json" \
- -H "Authorization: Bearer ${BRIGHTDATA_TOKEN}"'
- 5. List Snapshots
- Get all your snapshots:
- bash
- -c
- 'curl -s "https://api.brightdata.com/datasets/v3/snapshots" \
- -H "Authorization: Bearer ${BRIGHTDATA_TOKEN}"'
- |
- jq
- '.[] | {snapshot_id, dataset_id, status}'
- 6. Cancel Snapshot
- Cancel a running job (replace
- with your actual snapshot ID):
- bash
- -c
- 'curl -s -X POST "https://api.brightdata.com/datasets/v3/cancel?snapshot_id=" \
- -H "Authorization: Bearer ${BRIGHTDATA_TOKEN}"'
- Platform-Specific Examples
- Twitter/X - Scrape Profile
- Write to
- /tmp/brightdata_request.json
- :
- [
- {
- "url"
- :
- "https://twitter.com/elonmusk"
- }
- ]
- Then run (replace
- with your actual dataset ID):
- bash
- -c
- 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \
- -H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
- -H "Content-Type: application/json" \
- -d @/tmp/brightdata_request.json'
- Returns:
- x_id
- ,
- profile_name
- ,
- biography
- ,
- is_verified
- ,
- followers
- ,
- following
- ,
- profile_image_link
- Twitter/X - Scrape Posts
- Write to
- /tmp/brightdata_request.json
- :
- [
- {
- "url"
- :
- "https://twitter.com/username/status/123456789"
- }
- ]
- Then run (replace
- with your actual dataset ID):
- bash
- -c
- 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \
- -H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
- -H "Content-Type: application/json" \
- -d @/tmp/brightdata_request.json'
- Returns:
- post_id
- ,
- text
- ,
- replies
- ,
- likes
- ,
- retweets
- ,
- views
- ,
- hashtags
- ,
- media
- Reddit - Scrape Subreddit Posts
- Write to
- /tmp/brightdata_request.json
- :
- [
- {
- "url"
- :
- "https://www.reddit.com/r/technology"
- ,
- "sort_by"
- :
- "hot"
- }
- ]
- Then run (replace
- with your actual dataset ID):
- bash
- -c
- 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=" \
- -H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
- -H "Content-Type: application/json" \
- -d @/tmp/brightdata_request.json'
- Parameters:
- url
- ,
- sort_by
- (new/top/hot)
- Returns:
- post_id
- ,
- title
- ,
- description
- ,
- num_comments
- ,
- upvotes
- ,
- date_posted
- ,
- community
- Reddit - Scrape Comments
- Write to
- /tmp/brightdata_request.json
- :
- [
- {
- "url"
- :
- "https://www.reddit.com/r/technology/comments/xxxxx/post_title"
- }
- ]
- Then run (replace
- with your actual dataset ID):
- bash
- -c
- 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \
- -H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
- -H "Content-Type: application/json" \
- -d @/tmp/brightdata_request.json'
- Returns:
- comment_id
- ,
- user_posted
- ,
- comment_text
- ,
- upvotes
- ,
- replies
- YouTube - Scrape Video Info
- Write to
- /tmp/brightdata_request.json
- :
- [
- {
- "url"
- :
- "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
- }
- ]
- Then run (replace
- with your actual dataset ID):
- bash
- -c
- 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \
- -H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
- -H "Content-Type: application/json" \
- -d @/tmp/brightdata_request.json'
- Returns:
- title
- ,
- views
- ,
- likes
- ,
- num_comments
- ,
- video_length
- ,
- transcript
- ,
- channel_name
- YouTube - Search by Keyword
- Write to
- /tmp/brightdata_request.json
- :
- [
- {
- "keyword"
- :
- "artificial intelligence"
- ,
- "num_of_posts"
- :
- 50
- }
- ]
- Then run (replace
- with your actual dataset ID):
- bash
- -c
- 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=" \
- -H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
- -H "Content-Type: application/json" \
- -d @/tmp/brightdata_request.json'
- YouTube - Scrape Comments
- Write to
- /tmp/brightdata_request.json
- :
- [
- {
- "url"
- :
- "https://www.youtube.com/watch?v=xxxxx"
- ,
- "load_replies"
- :
- 3
- }
- ]
- Then run (replace
- with your actual dataset ID):
- bash
- -c
- 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \
- -H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
- -H "Content-Type: application/json" \
- -d @/tmp/brightdata_request.json'
- Returns:
- comment_text
- ,
- likes
- ,
- replies
- ,
- username
- ,
- date
- Instagram - Scrape Profile
- Write to
- /tmp/brightdata_request.json
- :
- [
- {
- "url"
- :
- "https://www.instagram.com/username"
- }
- ]
- Then run (replace
- with your actual dataset ID):
- bash
- -c
- 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \
- -H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
- -H "Content-Type: application/json" \
- -d @/tmp/brightdata_request.json'
- Returns:
- followers
- ,
- post_count
- ,
- profile_name
- ,
- is_verified
- ,
- biography
- Instagram - Scrape Posts
- Write to
- /tmp/brightdata_request.json
- :
- [
- {
- "url"
- :
- "https://www.instagram.com/username"
- ,
- "num_of_posts"
- :
- 20
- ,
- "start_date"
- :
- "01-01-2024"
- ,
- "end_date"
- :
- "12-31-2024"
- }
- ]
- Then run (replace
- with your actual dataset ID):
- bash
- -c
- 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=" \
- -H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
- -H "Content-Type: application/json" \
- -d @/tmp/brightdata_request.json'
- Account Management
- Check Account Status
- bash
- -c
- 'curl -s "https://api.brightdata.com/status" \
- -H "Authorization: Bearer ${BRIGHTDATA_TOKEN}"'
- Response:
- {
- "status"
- :
- "active"
- ,
- "customer"
- :
- "hl_xxxxxxxx"
- ,
- "can_make_requests"
- :
- true
- ,
- "ip"
- :
- "x.x.x.x"
- }
- Get Active Zones
- bash
- -c
- 'curl -s "https://api.brightdata.com/zone/get_active_zones" \
- -H "Authorization: Bearer ${BRIGHTDATA_TOKEN}"'
- |
- jq
- '.[] | {name, type}'
- Get Bandwidth Usage
- bash
- -c
- 'curl -s "https://api.brightdata.com/customer/bw" \
- -H "Authorization: Bearer ${BRIGHTDATA_TOKEN}"'
- Getting Dataset IDs
- To use the scraping features, you need a
- dataset_id
- :
- Go to
- Bright Data Control Panel
- Create a new Web Scraper dataset or select an existing one
- Choose the platform (Twitter, Reddit, YouTube, etc.)
- Copy the
- dataset_id
- from the dataset settings
- Dataset IDs can also be found in the bandwidth usage API response under the
- data
- field keys (e.g.,
- v__ds_api_gd_xxxxx
- where
- gd_xxxxx
- is your dataset ID).
- Common Parameters
- Parameter
- Description
- Example
- url
- Target URL to scrape
- https://twitter.com/user
- keyword
- Search keyword
- "artificial intelligence"
- num_of_posts
- Limit number of results
- 50
- start_date
- Filter by date (MM-DD-YYYY)
- "01-01-2024"
- end_date
- Filter by date (MM-DD-YYYY)
- "12-31-2024"
- sort_by
- Sort order (Reddit)
- new
- ,
- top
- ,
- hot
- format
- Response format
- json
- ,
- csv
- Rate Limits
- Batch mode: up to 100 concurrent requests
- Maximum input size: 1GB per batch
- Exceeding limits returns
- 429
- error
- Guidelines
- Create datasets first
-
- Use the Control Panel to create scraper datasets
- Use async for large jobs
-
- Use
- /trigger
- for discovery and batch operations
- Use sync for small jobs
-
- Use
- /scrape
- for single URL quick lookups
- Check status before download
-
- Poll
- /progress
- until status is
- ready
- Respect rate limits
-
- Don't exceed 100 concurrent requests
- Date format
- Use MM-DD-YYYY for date parameters
← 返回排行榜