Runbook: Secret Rotation
Every secret in BreezyCorp supports zero-downtime rotation via a dual-key pattern: a _PREVIOUS variant is accepted while the new key propagates, then retired.
Catalog
| Secret | Rotation impact | Strategy | Owner |
|---|---|---|---|
JWT_SECRET | Invalidates magic-link tokens | 48h dual-verify window (max magic-link TTL) | Platform |
MAGIC_LINK_SECRET | Same as JWT_SECRET | Same | Platform |
ANTHROPIC_API_KEY | OCR field extraction stops until replaced | Generate new key in console.anthropic.com, replace in .env, restart worker | Platform |
GOOGLE_VISION_CREDENTIALS_JSON | OCR pipeline fails at vision stage | Rotate service account key in GCP IAM, replace JSON blob in .env, restart worker | Platform |
MFA_SECRET_KEY | Cannot decrypt stored TOTP seeds | Decrypt-with-previous at login; one-time re-encrypt migration | Platform |
SEED_ADMIN_PASSWORD | Dev-only | n/a | — |
DATABASE_URL password | Connection loss | Managed rotation via RDS/Neon; blue/green with PgBouncer | Platform + DBA |
| S3 / MinIO credentials | Upload failures | AWS IAM rotation with 30-min overlap | Platform |
SENTRY_DSN | Lost error capture during rotation | Set new DSN, roll restart | Platform |
General procedure
- Generate new secret (use
openssl rand -hex 32or provider tooling) - Set the new value as
<SECRET>_PREVIOUSfirst, deploy - Then set the new value as
<SECRET>, deploy again - Wait out the overlap window (see per-secret table above)
- Remove
<SECRET>_PREVIOUS, deploy final
At step 3, both keys are live; validators accept either. At step 5, only the new key remains.
JWT_SECRET rotation (worked example)
# 1. Generate
NEW=$(openssl rand -hex 32)
# 2. Stage as previous — existing tokens still work, no impact
kubectl set env deployment/spade-api JWT_SECRET_PREVIOUS=$OLD
kubectl rollout status deployment/spade-api
# 3. Promote the new key
kubectl set env deployment/spade-api JWT_SECRET=$NEW
kubectl rollout status deployment/spade-api
# 4. Wait 48h (longest magic-link TTL)
# 5. Drop the previous
kubectl set env deployment/spade-api JWT_SECRET_PREVIOUS-
kubectl rollout status deployment/spade-apiOCR provider credential rotation
OCR runs synchronously in the worker against Google Cloud Vision + Anthropic Claude (see packages/documents/src/factory.ts). There is no webhook callback path and no dual-key rotation — swapping a credential means a brief OCR outage while the worker restarts.
Anthropic (ANTHROPIC_API_KEY):
- Create new key at console.anthropic.com → API Keys
- Replace
ANTHROPIC_API_KEYin the worker's environment - Restart the worker process
- Verify with a portal file upload that OCR jobs complete
- Delete the old key in the Anthropic console
Google Cloud Vision (GOOGLE_VISION_CREDENTIALS_JSON):
- GCP Console → IAM & Admin → Service Accounts →
vision-ocr@…→ Keys → Add Key → JSON - Replace
GOOGLE_VISION_CREDENTIALS_JSONin the worker's environment with the full JSON blob - Restart the worker process
- Verify with a portal file upload that OCR jobs complete
- Delete the old key in the same Keys tab
If OCR fails with DECODER routines::unsupported, the base64 private_key is malformed — copy the downloaded JSON directly without manual line-wrapping.
MFA_SECRET_KEY rotation (destructive-adjacent)
TOTP seeds are encrypted at rest with AES-256-GCM keyed on MFA_SECRET_KEY. Rotation requires decrypting with the old key and re-encrypting with the new one.
MFA_SECRET_KEY_PREVIOUS=<current>— deploy- Set
MFA_SECRET_KEY=<new>— deploy - Run the re-encryption migration: (TODO: build
scripts/reencrypt-mfa-secrets.ts) — iterates staff_users, decrypts with previous key, re-encrypts with current - Verify:
SELECT COUNT(*) FROM staff_users WHERE mfa_secret IS NOT NULL× trial login for a sample user - Remove
MFA_SECRET_KEY_PREVIOUS
If the migration is skipped: users can still log in (decrypt-with-previous fallback in packages/auth/src/crypto.ts), but you can never drop the previous key.
Password change → session revocation
Changing a staff user's password already revokes all their other sessions (Phase 3A.5). Rotating the password store secret is separate — see DATABASE_URL rotation above.
Emergency rotation (compromise suspected)
- Skip the overlap window
- Set new secret directly (don't stage as previous)
- Accept the downtime: all affected sessions/tokens break immediately
- Force staff re-login by also running the
UPDATE staff_sessions SET revoked_at = NOW()query fromrunbooks/tenant-leak-suspected.md - Open a post-mortem channel
Related
packages/auth/src/crypto.ts— dual-key AES-256-GCM implementationpackages/documents/src/factory.ts— OCR adapter construction from envdocs/pii-inventory.md— which secrets protect which PII columns