bright-data

安装量: 67
排名: #11329

安装

npx skills add https://github.com/vm0-ai/vm0-skills --skill bright-data
Bright Data Web Scraper API
Use the Bright Data API via direct
curl
calls for
social media scraping
,
web data extraction
, and
account management
.
Official docs:
https://docs.brightdata.com/
When to Use
Use this skill when you need to:
Scrape social media
- Twitter/X, Reddit, YouTube, Instagram, TikTok, LinkedIn
Extract web data
- Posts, profiles, comments, engagement metrics
Monitor usage
- Track bandwidth and request usage
Manage account
- Check status and zones
Prerequisites
Sign up at
Bright Data
Get your API key from
Settings > Users
Create a Web Scraper dataset in the
Control Panel
to get your
dataset_id
export
BRIGHTDATA_TOKEN
=
"your-api-key"
Base URL
https://api.brightdata.com
Important:
When using
$VAR
in a command that pipes to another command, wrap the command containing
$VAR
in
bash -c '...'
. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.
bash
-c
'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'
Social Media Scraping
Bright Data supports scraping these social media platforms:
Platform
Profiles
Posts
Comments
Reels/Videos
Twitter/X
-
-
Reddit
-
-
YouTube
-
Instagram
TikTok
-
LinkedIn
-
-
How to Use
1. Trigger Scraping (Asynchronous)
Trigger a data collection job and get a
snapshot_id
for later retrieval.
Write to
/tmp/brightdata_request.json
:
[
{
"url"
:
"https://twitter.com/username"
}
,
{
"url"
:
"https://twitter.com/username2"
}
]
Then run (replace
with your actual dataset ID):
bash
-c
'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=" \
-H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
Response:
{
"snapshot_id"
:
"s_m4x7enmven8djfqak"
}
2. Trigger Scraping (Synchronous)
Get results immediately in the response (for small requests).
Write to
/tmp/brightdata_request.json
:
[
{
"url"
:
"https://www.reddit.com/r/technology/comments/xxxxx"
}
]
Then run (replace
with your actual dataset ID):
bash
-c
'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \
-H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
3. Monitor Progress
Check the status of a scraping job (replace
with your actual snapshot ID):
bash
-c
'curl -s "https://api.brightdata.com/datasets/v3/progress/" \
-H "Authorization: Bearer ${BRIGHTDATA_TOKEN}"'
Response:
{
"snapshot_id"
:
"s_m4x7enmven8djfqak"
,
"dataset_id"
:
"gd_xxxxx"
,
"status"
:
"running"
}
Status values:
running
,
ready
,
failed
4. Download Results
Once status is
ready
, download the collected data (replace
with your actual snapshot ID):
bash
-c
'curl -s "https://api.brightdata.com/datasets/v3/snapshot/?format=json" \
-H "Authorization: Bearer ${BRIGHTDATA_TOKEN}"'
5. List Snapshots
Get all your snapshots:
bash
-c
'curl -s "https://api.brightdata.com/datasets/v3/snapshots" \
-H "Authorization: Bearer ${BRIGHTDATA_TOKEN}"'
|
jq
'.[] | {snapshot_id, dataset_id, status}'
6. Cancel Snapshot
Cancel a running job (replace
with your actual snapshot ID):
bash
-c
'curl -s -X POST "https://api.brightdata.com/datasets/v3/cancel?snapshot_id=" \
-H "Authorization: Bearer ${BRIGHTDATA_TOKEN}"'
Platform-Specific Examples
Twitter/X - Scrape Profile
Write to
/tmp/brightdata_request.json
:
[
{
"url"
:
"https://twitter.com/elonmusk"
}
]
Then run (replace
with your actual dataset ID):
bash
-c
'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \
-H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
Returns:
x_id
,
profile_name
,
biography
,
is_verified
,
followers
,
following
,
profile_image_link
Twitter/X - Scrape Posts
Write to
/tmp/brightdata_request.json
:
[
{
"url"
:
"https://twitter.com/username/status/123456789"
}
]
Then run (replace
with your actual dataset ID):
bash
-c
'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \
-H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
Returns:
post_id
,
text
,
replies
,
likes
,
retweets
,
views
,
hashtags
,
media
Reddit - Scrape Subreddit Posts
Write to
/tmp/brightdata_request.json
:
[
{
"url"
:
"https://www.reddit.com/r/technology"
,
"sort_by"
:
"hot"
}
]
Then run (replace
with your actual dataset ID):
bash
-c
'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=" \
-H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
Parameters:
url
,
sort_by
(new/top/hot)
Returns:
post_id
,
title
,
description
,
num_comments
,
upvotes
,
date_posted
,
community
Reddit - Scrape Comments
Write to
/tmp/brightdata_request.json
:
[
{
"url"
:
"https://www.reddit.com/r/technology/comments/xxxxx/post_title"
}
]
Then run (replace
with your actual dataset ID):
bash
-c
'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \
-H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
Returns:
comment_id
,
user_posted
,
comment_text
,
upvotes
,
replies
YouTube - Scrape Video Info
Write to
/tmp/brightdata_request.json
:
[
{
"url"
:
"https://www.youtube.com/watch?v=dQw4w9WgXcQ"
}
]
Then run (replace
with your actual dataset ID):
bash
-c
'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \
-H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
Returns:
title
,
views
,
likes
,
num_comments
,
video_length
,
transcript
,
channel_name
YouTube - Search by Keyword
Write to
/tmp/brightdata_request.json
:
[
{
"keyword"
:
"artificial intelligence"
,
"num_of_posts"
:
50
}
]
Then run (replace
with your actual dataset ID):
bash
-c
'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=" \
-H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
YouTube - Scrape Comments
Write to
/tmp/brightdata_request.json
:
[
{
"url"
:
"https://www.youtube.com/watch?v=xxxxx"
,
"load_replies"
:
3
}
]
Then run (replace
with your actual dataset ID):
bash
-c
'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \
-H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
Returns:
comment_text
,
likes
,
replies
,
username
,
date
Instagram - Scrape Profile
Write to
/tmp/brightdata_request.json
:
[
{
"url"
:
"https://www.instagram.com/username"
}
]
Then run (replace
with your actual dataset ID):
bash
-c
'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \
-H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
Returns:
followers
,
post_count
,
profile_name
,
is_verified
,
biography
Instagram - Scrape Posts
Write to
/tmp/brightdata_request.json
:
[
{
"url"
:
"https://www.instagram.com/username"
,
"num_of_posts"
:
20
,
"start_date"
:
"01-01-2024"
,
"end_date"
:
"12-31-2024"
}
]
Then run (replace
with your actual dataset ID):
bash
-c
'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=" \
-H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
Account Management
Check Account Status
bash
-c
'curl -s "https://api.brightdata.com/status" \
-H "Authorization: Bearer ${BRIGHTDATA_TOKEN}"'
Response:
{
"status"
:
"active"
,
"customer"
:
"hl_xxxxxxxx"
,
"can_make_requests"
:
true
,
"ip"
:
"x.x.x.x"
}
Get Active Zones
bash
-c
'curl -s "https://api.brightdata.com/zone/get_active_zones" \
-H "Authorization: Bearer ${BRIGHTDATA_TOKEN}"'
|
jq
'.[] | {name, type}'
Get Bandwidth Usage
bash
-c
'curl -s "https://api.brightdata.com/customer/bw" \
-H "Authorization: Bearer ${BRIGHTDATA_TOKEN}"'
Getting Dataset IDs
To use the scraping features, you need a
dataset_id
:
Go to
Bright Data Control Panel
Create a new Web Scraper dataset or select an existing one
Choose the platform (Twitter, Reddit, YouTube, etc.)
Copy the
dataset_id
from the dataset settings
Dataset IDs can also be found in the bandwidth usage API response under the
data
field keys (e.g.,
v__ds_api_gd_xxxxx
where
gd_xxxxx
is your dataset ID).
Common Parameters
Parameter
Description
Example
url
Target URL to scrape
https://twitter.com/user
keyword
Search keyword
"artificial intelligence"
num_of_posts
Limit number of results
50
start_date
Filter by date (MM-DD-YYYY)
"01-01-2024"
end_date
Filter by date (MM-DD-YYYY)
"12-31-2024"
sort_by
Sort order (Reddit)
new
,
top
,
hot
format
Response format
json
,
csv
Rate Limits
Batch mode: up to 100 concurrent requests
Maximum input size: 1GB per batch
Exceeding limits returns
429
error
Guidelines
Create datasets first
Use the Control Panel to create scraper datasets
Use async for large jobs
Use
/trigger
for discovery and batch operations
Use sync for small jobs
Use
/scrape
for single URL quick lookups
Check status before download
Poll
/progress
until status is
ready
Respect rate limits
Don't exceed 100 concurrent requests
Date format
Use MM-DD-YYYY for date parameters
返回排行榜