Skip to content

Runbook: Secret Rotation

Every secret in BreezyCorp supports zero-downtime rotation via a dual-key pattern: a _PREVIOUS variant is accepted while the new key propagates, then retired.

Catalog

SecretRotation impactStrategyOwner
JWT_SECRETInvalidates magic-link tokens48h dual-verify window (max magic-link TTL)Platform
MAGIC_LINK_SECRETSame as JWT_SECRETSamePlatform
ANTHROPIC_API_KEYOCR field extraction stops until replacedGenerate new key in console.anthropic.com, replace in .env, restart workerPlatform
GOOGLE_VISION_CREDENTIALS_JSONOCR pipeline fails at vision stageRotate service account key in GCP IAM, replace JSON blob in .env, restart workerPlatform
MFA_SECRET_KEYCannot decrypt stored TOTP seedsDecrypt-with-previous at login; one-time re-encrypt migrationPlatform
SEED_ADMIN_PASSWORDDev-onlyn/a
DATABASE_URL passwordConnection lossManaged rotation via RDS/Neon; blue/green with PgBouncerPlatform + DBA
S3 / MinIO credentialsUpload failuresAWS IAM rotation with 30-min overlapPlatform
SENTRY_DSNLost error capture during rotationSet new DSN, roll restartPlatform

General procedure

  1. Generate new secret (use openssl rand -hex 32 or provider tooling)
  2. Set the new value as <SECRET>_PREVIOUS first, deploy
  3. Then set the new value as <SECRET>, deploy again
  4. Wait out the overlap window (see per-secret table above)
  5. Remove <SECRET>_PREVIOUS, deploy final

At step 3, both keys are live; validators accept either. At step 5, only the new key remains.

JWT_SECRET rotation (worked example)

bash
# 1. Generate
NEW=$(openssl rand -hex 32)

# 2. Stage as previous — existing tokens still work, no impact
kubectl set env deployment/spade-api JWT_SECRET_PREVIOUS=$OLD
kubectl rollout status deployment/spade-api

# 3. Promote the new key
kubectl set env deployment/spade-api JWT_SECRET=$NEW
kubectl rollout status deployment/spade-api

# 4. Wait 48h (longest magic-link TTL)

# 5. Drop the previous
kubectl set env deployment/spade-api JWT_SECRET_PREVIOUS-
kubectl rollout status deployment/spade-api

OCR provider credential rotation

OCR runs synchronously in the worker against Google Cloud Vision + Anthropic Claude (see packages/documents/src/factory.ts). There is no webhook callback path and no dual-key rotation — swapping a credential means a brief OCR outage while the worker restarts.

Anthropic (ANTHROPIC_API_KEY):

  1. Create new key at console.anthropic.com → API Keys
  2. Replace ANTHROPIC_API_KEY in the worker's environment
  3. Restart the worker process
  4. Verify with a portal file upload that OCR jobs complete
  5. Delete the old key in the Anthropic console

Google Cloud Vision (GOOGLE_VISION_CREDENTIALS_JSON):

  1. GCP Console → IAM & Admin → Service Accounts → vision-ocr@… → Keys → Add Key → JSON
  2. Replace GOOGLE_VISION_CREDENTIALS_JSON in the worker's environment with the full JSON blob
  3. Restart the worker process
  4. Verify with a portal file upload that OCR jobs complete
  5. Delete the old key in the same Keys tab

If OCR fails with DECODER routines::unsupported, the base64 private_key is malformed — copy the downloaded JSON directly without manual line-wrapping.

MFA_SECRET_KEY rotation (destructive-adjacent)

TOTP seeds are encrypted at rest with AES-256-GCM keyed on MFA_SECRET_KEY. Rotation requires decrypting with the old key and re-encrypting with the new one.

  1. MFA_SECRET_KEY_PREVIOUS=<current> — deploy
  2. Set MFA_SECRET_KEY=<new> — deploy
  3. Run the re-encryption migration: (TODO: build scripts/reencrypt-mfa-secrets.ts) — iterates staff_users, decrypts with previous key, re-encrypts with current
  4. Verify: SELECT COUNT(*) FROM staff_users WHERE mfa_secret IS NOT NULL × trial login for a sample user
  5. Remove MFA_SECRET_KEY_PREVIOUS

If the migration is skipped: users can still log in (decrypt-with-previous fallback in packages/auth/src/crypto.ts), but you can never drop the previous key.

Password change → session revocation

Changing a staff user's password already revokes all their other sessions (Phase 3A.5). Rotating the password store secret is separate — see DATABASE_URL rotation above.

Emergency rotation (compromise suspected)

  1. Skip the overlap window
  2. Set new secret directly (don't stage as previous)
  3. Accept the downtime: all affected sessions/tokens break immediately
  4. Force staff re-login by also running the UPDATE staff_sessions SET revoked_at = NOW() query from runbooks/tenant-leak-suspected.md
  5. Open a post-mortem channel
  • packages/auth/src/crypto.ts — dual-key AES-256-GCM implementation
  • packages/documents/src/factory.ts — OCR adapter construction from env
  • docs/pii-inventory.md — which secrets protect which PII columns

Internal use only — BreezyCorp