Runbook: Suspected Tenant Leak
Trigger: Any signal that client A may have accessed client B's data Severity: CRITICAL — treat every report as real until proven otherwise
Tenant isolation is the product's load-bearing security property. Phase 3B put two enforcement layers in place (route-level assertClientTenant + Prisma withTenantScope extension) and a route-coverage test gates CI. Any reported leak invalidates that assurance until this runbook confirms otherwise.
Immediate actions (first 15 min)
1. Preserve evidence
- Screenshot / save the reporter's evidence (URL, request body, response body)
- Capture the request ID from the response or
x-request-idheader - Grab the Sentry event if there is one
2. Revoke sessions for any potentially affected staff users
If the leak was via a staff session, rotate:
-- Revoke every active staff session — users will re-login
UPDATE staff_sessions
SET revoked_at = NOW(), revoked_reason = 'tenant_leak_investigation'
WHERE revoked_at IS NULL;If the leak was via a client magic-link, expire the specific request:
UPDATE cycle_requests
SET expires_at = NOW()
WHERE id = '<suspected-request-id>';3. Open a critical incident channel
#incident-tenant-leak-<YYYYMMDD> in Slack. Loop in the Engineering lead.
Investigation
1. Grep audit events
-- All cross-tenant reads by the suspect actor in the last 24h
SELECT
ae.occurred_at,
ae.event_type,
ae.actor_id,
ae.client_id AS accessed_client,
ae.entity_type,
ae.entity_id,
ae.event_data_json
FROM audit_events ae
WHERE ae.actor_id = '<staff_user_or_contact_id>'
AND ae.occurred_at > NOW() - INTERVAL '24 hours'
AND ae.event_type IN ('cycle.read', 'file.downloaded', 'cross_tenant_read')
ORDER BY ae.occurred_at DESC;2. Check the route
Every non-public route is tagged routeScope: 'staff-only' or client-scoped (enforced by route-coverage.test.ts). Verify the suspected leaky route has the correct scope:
grep -r "routeScope" apps/api/src/routes/$SUSPECT_ROUTE3. Re-run the route-coverage test locally against the committed code
pnpm --filter @breezycorp/api vitest run src/__tests__/route-coverage.test.tsMust pass. If it doesn't, a route is undeclared — that's the bug.
4. Check for a Prisma extension bypass
The tenant-scope extension in packages/db/src/extensions/tenant-scope.ts only catches findMany/updateMany/deleteMany/count. findUnique/findFirst are bypassed by design (the route handler's assertClientTenant is responsible).
Search for any repo method that fetches a row by id without a subsequent scope check:
grep -rn "findUnique\|findFirst" apps/api/src/routes | grep -v assertClientTenantContainment
Once the root cause is identified:
A. Patch + deploy
- Add the missing scope check
- Write a regression test in
tenant-isolation.test.ts - Fast-track through CI (priority merge)
B. If patch will take > 1 hour
- Temporarily take the affected route offline by returning 503 for
client-scopedrequests until the patch lands - Announce degradation in #spade-ops
Disclosure
If confirmed leak:
- Engineering lead decides on the notification scope (usually: affected clients + all clients if the issue was systemic)
- Draft notification template:
docs/templates/tenant-leak-disclosure.md(TODO — create during first incident) - Regulatory: check PDPA obligations (Singapore Personal Data Protection Act) for breach notification timelines
- Post-mortem: blameless, published within 72 hours of resolution
Related
apps/api/src/plugins/require-client-scope.tsapps/api/src/plugins/scope-enforcer.tspackages/db/src/extensions/tenant-scope.tsapps/api/src/__tests__/tenant-isolation.test.ts— add regression cases hereapps/api/src/__tests__/route-coverage.test.ts— CI gate