RFC-002 — RSS / Atom subscription management

Status: Draft
Authors: @yiidtw
Created: 2026-04-21
Related: RFC-001 (feature flags, amem rss setup)

TL;DR

Add a first-class subscription layer on top of the existing capture pipeline. Users amem sub add <feed> to follow a source (arxiv category, blog, YouTube channel); a polling loop fetches the feed, dedups by GUID, and routes each new item through the existing capture + (optional) compile flow. MCP exposes amem_subscribe / amem_sub_list so agents can manage the user’s reading queue.

Motivation

amem’s capture flow is reactive: it only runs when a human (or agent) hands it a URL. That makes it useless for tracking ongoing sources:

Following arxiv cs.CL as new papers drop
Karpathy / Willison / lesswrong blog posts
A YouTube channel’s new uploads (YouTube publishes per-channel RSS natively)

Every knowledge worker we’ve talked to does some version of this manually today — Feedly/NetNewsWire for reading, then copy-paste URLs into whatever capture tool they use. amem can collapse both steps.

This also inverts the self-recording story: amem is designed for you to produce content into. RSS lets other people’s content flow in on the same rails, so the wiki grows continuously rather than only after active capture.

Proposal

1. Subscription storage

# ~/.amem/subscriptions.toml
version = 1

[[subscription]]
id            = "arxiv-cs-cl"
url           = "http://export.arxiv.org/rss/cs.CL"
title         = "arXiv cs.CL (Computation and Language)"
auto_compile  = false         # capture-only by default; compile is opt-in
poll_minutes  = 60
enabled       = true
added_at      = "2026-04-21T00:00:00Z"
last_polled   = "2026-04-21T00:30:00Z"

[[subscription]]
id            = "3b1b-channel"
url           = "https://www.youtube.com/feeds/videos.xml?channel_id=UCYO_jab_esuFRV4b17AJtAw"
title         = "3Blue1Brown"
auto_compile  = true          # small channel, OK to auto-transcribe
poll_minutes  = 240
enabled       = true

2. Dedup ledger

~/.amem/subscriptions/
  ledger.jsonl                 # append-only, one JSON object per seen item
  state/{sub_id}/last_etag     # HTTP caching

Each ledger line:

{"sub_id":"3b1b-channel","guid":"yt:video:aircAruvnKk","captured_at":"2026-04-21T00:30:00Z","cite_key":"3blue1brown2017neural"}

Dedup is GUID-based. If a feed republishes an item (edit, repost), the existing capture wins; we don’t re-download.

3. CLI

amem sub add <url> [--auto-compile] [--poll-minutes N] [--title "..."]
amem sub list [--json]
amem sub remove <id>
amem sub enable|disable <id>
amem sub fetch [<id>]          # one-shot poll, honours etag
amem sub daemon                # long-running poller (used by service unit)
amem rss setup                 # install daemon (macOS launchd / Linux systemd user)

4. Poll algorithm

For each enabled sub whose now - last_polled >= poll_minutes:

GET feed with If-None-Match: {last_etag} and If-Modified-Since: {last_polled}
304 → update last_polled, skip
200 → parse via feed-rs, iterate items
For each item not in ledger:
- Route to existing cite::cmd_capture(item.link) (auto-picks arxiv / PDF / YouTube based on URL)
- If auto_compile = true → also call the appropriate cmd_compile
- Write ledger line
Update ledger + state

Failures per-item don’t block the rest of the feed. Aggregate failures re-queue with exponential backoff (15 min → 2 h cap).

5. MCP surface

amem_sub_add(url, auto_compile?) -> sub_id
amem_sub_list() -> [{ id, title, last_polled, enabled, ... }]
amem_sub_remove(id)
amem_sub_fetch(id?) -> {fetched: N, captured: M, errors: [...] }

This lets an agent maintain its own research feed without a human in the loop: “follow every arxiv paper that cites Vaswani 2017” becomes a single MCP call.

6. Gated behind `features.rss`

Disabled by default. amem rss setup enables it, installs the daemon, and writes features.rss = true to ~/.amem/config.toml (per RFC-001).

Non-goals

Rich reader UI. amem is not Feedly. Reading lives in the wiki + amem recall. If people want visual unread counts, that belongs in an extension page, not the core.
OPML import on day 1. Easy add later; skip for MVP to keep surface small.
Arbitrary scheduling cron. poll_minutes is enough; cron-syntax scheduling is out of scope.
Podcast audio-only feeds. These would need whisper anyway — treat them as RFC-002b when YouTube pipeline is stable on more models.

Risks

Risk	Mitigation
A popular arxiv category fills disk (dozens of papers/day)	`poll_minutes` default 120 + per-feed disk quota + user confirmation on first-time `auto_compile = true`
Feed publisher rate-limits us	Honour `Retry-After`, respect 429; back off to 6 h for repeat offenders
Duplicate captures when arxiv updates a paper’s version	Keep first ingest; subsequent versions append a note to the existing wiki entry rather than creating a new cite_key
RSS spec is loose — malformed feeds break parser	`feed-rs` handles common variants; log + skip malformed entries, do not abort the poll

Concrete work

Rust crate additions: feed-rs = "2", toml_edit = "0.22" (config writes preserve comments) (amem-sh)
amem sub subcommand family (amem-sh)
amem rss setup installer and amem sub daemon long-runner (amem-sh)
MCP tools (amem-sh)
SPEC.md: add subscription to the storage layout section (amem-hq)
Docs: new guide/subscriptions.md page (amem-hq)

Extension UI for managing subscriptions is deferred — CLI first.

Keyboard shortcuts

amem.sh