Skip to content

Day-2 operations

$ day-2 · 9 min read · updated 2026-06-10

Everything on this page assumes a running registry with real capsule data. Test each runbook against a scratch registry (CAPSOL_DATA_DIR=/tmp/capsol-drill capsol serve) before you need it in anger.

capsol upgrades are in-place: stop the process, install the new version, start it. Registry state lives entirely in CAPSOL_DATA_DIR (default ./registry-data) and is migrated forward automatically at boot — migrations are additive and idempotent.

  1. Take a backup first (next section). 0.16 adds fields; it does not rewrite existing records, but a backup makes the rollback trivial.
  2. Install and restart: npm install -g capsol@0.16 then restart capsol serve (or redeploy the container).
  3. What changes for clients:
    • Nothing breaks. /mcp/:capsuleId URLs, existing enrollment tokens, and existing share credentials keep working. Credentials issued before 0.16 keep their recorded expiry (none), so no agent is locked out by the upgrade.
    • New OAuth access tokens now expire after 30 days and come with a refresh token. Long-lived clients should use the refresh_token grant; see API reference.
    • Dashboard sessions need a re-login once (cookies are now SameSite=Strict and carry a CSRF token).
  4. Verify: curl -s localhost:4000/health shows the new version; curl -s localhost:4000/ready reports status: "ready"; an agent can still read its capsule.
  5. Rollback: stop the process, reinstall the previous version, restore the backup taken in step 1 if anything migrated unexpectedly.

All registry state is files. There is no database.

WhatWhereContains
Registry indexCAPSOL_DATA_DIR/*.jsoncapsules, shares, principals, credential hashes, grants, settings, policy
Capsule contentCAPSOL_DATA_DIR/boxes/<id>/.capsol/knowledge entries, blobs, audit logs, memory
Admin key~/.capsol/admin.keygenerated bootstrap key (skip if you set CAPSOL_ADMIN_KEY yourself)
Secret keywherever CAPSOL_SECRET_KEY_FILE pointsdecrypts OIDC/SMTP settings
Terminal window
# Hot backup is safe: writes are atomic temp+rename per file.
tar -czf capsol-backup-$(date +%Y%m%d-%H%M%S).tgz \
-C "$(dirname "$CAPSOL_DATA_DIR")" "$(basename "$CAPSOL_DATA_DIR")"

Store the tarball, the admin key, and the secret key in separate places — the tarball contains only credential hashes, but it does contain all capsule content.

Terminal window
systemctl stop capsol # or: fly scale count 0 / docker stop
mv "$CAPSOL_DATA_DIR" "$CAPSOL_DATA_DIR.broken-$(date +%s)"
tar -xzf capsol-backup-....tgz -C "$(dirname "$CAPSOL_DATA_DIR")"
systemctl start capsol
curl -s localhost:4000/ready # expect "ready"

Run the drill quarterly. A backup you have never restored is a hypothesis.

  1. Generate: openssl rand -hex 24 (or any ≥16-byte secret), or delete ~/.capsol/admin.key and let first-run regenerate one with a checksum.
  2. Set it: CAPSOL_ADMIN_KEY=<new> in the environment (preferred for hosted), or write it to ~/.capsol/admin.key.
  3. Restart the registry. Old dashboard sessions die immediately — the cookie is sha256(key) and stops matching.
  4. Update anything that calls /v1/* with the old key (CI, scripts, capsol grants --key).

Agent connections are not affected: MCP credentials are independent of the admin key.

The secret key encrypts the OIDC client secret and SMTP URL inside settings.json. Rotating it makes those two ciphertexts unreadable, so:

  1. Note the current OIDC client secret and SMTP URL (from your IdP / mail provider — they are not recoverable from capsol after rotation).
  2. Generate: openssl rand -base64 48 → set CAPSOL_SECRET_KEY (or CAPSOL_SECRET_KEY_FILE).
  3. Restart.
  4. Re-enter the OIDC client secret and SMTP URL in dashboard → Settings.

Revoke a compromised agent in under 60 seconds

Section titled “Revoke a compromised agent in under 60 seconds”
Terminal window
# 1. Find the connection (10s)
curl -s -H "Authorization: Bearer $ADMIN" localhost:4000/v1/connections | \
python3 -c "import json,sys; [print(c['connection_id'], c['label'], c['role']) for c in json.load(sys.stdin)['connections']]"
# 2. Revoke it (5s) — immediate; the next MCP call gets 401 token_revoked
curl -s -X PATCH -H "Authorization: Bearer $ADMIN" -H "Content-Type: application/json" \
-d '{"status":"revoked"}' localhost:4000/v1/shares/<connection_id>

Or in the dashboard: Connections → row → revoke. If the credential was OAuth-issued and you only have the token, POST /oauth/revoke with token=<value> kills the access and refresh token. Pausing ({"status":"paused"}) is the reversible variant.

  • Add: dashboard Operators → invite by email (or POST /v1/operators/invites). The link is single-use, expires in 7 days, and binds the invited role on OIDC sign-in. Direct creation (POST /v1/operators) is admin-only.
  • Change role: PATCH /v1/operators/:id {"role": "approver"}. The new role applies on the operator’s next login.
  • Offboard: PATCH /v1/operators/:id {"status": "disabled"} — existing sessions stop working immediately and new logins are refused. Their owned capsules stay; an admin can reassign ownership via PATCH /v1/capsules/:id.
  • No passwords: operator access is OIDC or break-glass only. If your IdP is down, the admin key still works (and is logged as break-glass).
  • Pre-0.19 capsules are unowned; an admin claims them with PATCH /v1/capsules/:id {"claim_ownership": true}.
  1. Read stderr. Boot failures are structured JSON with a remediation field.
  2. CAPSOL_SECRET_KEY required in production → set a persistent key: openssl rand -base64 48. The process exits non-zero by design rather than booting with an ephemeral key.
  3. Cannot create ~/.capsol / Cannot write → the message includes the exact mkdir/chmod fix.
  4. EADDRINUSE → with PORT set, the port is honored exactly and busy means exit; without it, capsol tries 4000–4010 automatically.
  5. Corrupt JSON in CAPSOL_DATA_DIR (e.g. disk-full partial write — rare; writes are atomic) → restore the affected file from backup; each *.json file is independent.

Ask the agent for the error_code — every 401 carries one (table):

CodeMeaningOperator action
token_expired30-day OAuth TTL elapsedNone — the client should use its refresh token
token_revokedCredential revoked/rotatedRe-approve access if unintended
grant_revokedYou (or policy) revoked the grantExpected; re-approve if needed
connection_pausedConnection pausedReactivate from Connections
invalid_tokenUnknown credentialClient misconfigured — re-enroll or re-run OAuth
  1. Revoke it (runbook above) — under 60 seconds.
  2. Check the capsule audit log (/v1/capsules/:id/logs or dashboard Activity) for what that connection read or wrote.
  3. Rotate the admin key if the leak vector could have included it.
  4. Audit logs are plain JSONL without tamper-evidence — treat them as a best-effort record, not forensic proof (see Security).
  • Per-capsule audit logs are daily JSONL files under boxes/<id>/.capsol/logs/. Nothing rotates them automatically; prune by age with a cron job (find ... -name '*.jsonl' -mtime +90 -delete) after archiving if you need history.
  • Signals are stored as capsule entries under notes://signals/ with a TTL (default 1 day) and are garbage-collected opportunistically on signal writes.
  • Anonymous capsules expire after 7 days and are cleaned hourly.
  • The registry *.json index files do not grow unboundedly with traffic — only with the number of capsules, connections, grants, and enrollments.