AI Tech RSS Fetch

Core Goal

Subscribe to RSS/Atom sources.

Persist feed and entry metadata to SQLite.

Deduplicate entries with layered identity keys plus content fingerprints.

Keep only metadata; do not fetch full article bodies and do not summarize.

Triggering Conditions

Receive a request to subscribe RSS feeds from URLs or OPML.

Receive a request to run incremental RSS sync reliably.

Need stable metadata persistence for downstream processing.

Need dedupe-safe storage of feed items over repeated runs.

Workflow

Prepare runtime and database.

Ensure dependency is installed:

python3 -m pip install feedparser

.

In multi-agent runtimes, pin DB to an absolute path before any command:

export

AI_RSS_DB_PATH

=

"/absolute/path/to/workspace-rss-bot/ai_rss.db"

Initialize SQLite schema once:

python3 scripts/rss_subscribe.py init-db

--db

"

$AI_RSS_DB_PATH

"

Add feed subscriptions.

Add one feed URL:

python3 scripts/rss_subscribe.py add-feed

--db

"

$AI_RSS_DB_PATH

"

--url

"https://example.com/feed.xml"

Import feeds from OPML:

python3 scripts/rss_subscribe.py import-opml

--db

"

$AI_RSS_DB_PATH

"

--opml

assets/hn-popular-blogs-2025.opml

Run incremental sync.

Fetch active feeds and store metadata:

python3 scripts/rss_subscribe.py

sync

--db

"

$AI_RSS_DB_PATH

"

--max-feeds

20

--max-items-per-feed

100

Optional one-feed sync:

python3 scripts/rss_subscribe.py

sync

--db

"

$AI_RSS_DB_PATH

"

--feed-url

"https://example.com/feed.xml"

Query persisted metadata.

List feeds:

python3 scripts/rss_subscribe.py list-feeds

--db

"

$AI_RSS_DB_PATH

"

--limit

50

List recent entries:

python3 scripts/rss_subscribe.py list-entries

--db

"

$AI_RSS_DB_PATH

"

--limit

100

Input Requirements

Supported inputs:

RSS XML feed URLs.

OPML feed list files.

Output Contract (Metadata Only)

Persist

feeds

metadata to SQLite:

feed_url

,

feed_title

,

site_url

,

etag

,

last_modified

, status fields.

Persist

entries

metadata to SQLite:

id

,

dedupe_key

(compat primary identity snapshot),

guid

,

url

,

canonical_url

,

title

,

author

,

published_at

,

updated_at

,

summary

,

categories

,

content_hash

,

match_confidence

, timestamps.

Persist

entry_identities

mapping table to SQLite:

entry_id

,

key_type

,

key_value

,

created_at

.

Supported key types:

guid

,

canonical_url

,

legacy_guid

,

fallback_hash

.

Do not store generated summaries and do not create archive markdown files.

Configurable Parameters

db_path

AI_RSS_DB_PATH

(recommended absolute path in multi-agent runtime)

opml_path

feed_urls

max_feeds_per_run

max_items_per_feed

user_agent

seen_ttl_days

enable_conditional_get

Example config:

assets/config.example.json

Error and Boundary Handling

Feed HTTP/network failure: keep syncing other feeds and record

last_error

.

Feed

304 Not Modified

skip entry parsing and keep state.
Missing
guid
and
link: use hashed fallback identity and set match_confidence=low . Dependency missing ( feedparser ): return install guidance. Final Output Checklist (Required) core goal trigger conditions input requirements metadata schema dedupe and sync rules command workflow configurable parameters error handling Use the following simplified checklist verbatim when the user requests it: 核心目标输入需求触发条件元数据模型去重与同步规则命令流程可配置参数错误处理 References references/input-model.md references/output-rules.md references/time-range-rules.md Assets assets/hn-popular-blogs-2025.opml (candidate feed pool) assets/config.example.json Scripts scripts/rss_subscribe.py

ai-tech-rss-fetch

安装