Runbook: Suspected Tenant Leak

Trigger: Any signal that client A may have accessed client B's data Severity: CRITICAL — treat every report as real until proven otherwise

Tenant isolation is the product's load-bearing security property. Phase 3B put two enforcement layers in place (route-level assertClientTenant + Prisma withTenantScope extension) and a route-coverage test gates CI. Any reported leak invalidates that assurance until this runbook confirms otherwise.

Immediate actions (first 15 min)

1. Preserve evidence

Screenshot / save the reporter's evidence (URL, request body, response body)
Capture the request ID from the response or x-request-id header
Grab the Sentry event if there is one

2. Revoke sessions for any potentially affected staff users

If the leak was via a staff session, rotate:

sql

-- Revoke every active staff session — users will re-login
UPDATE staff_sessions
SET revoked_at = NOW(), revoked_reason = 'tenant_leak_investigation'
WHERE revoked_at IS NULL;

If the leak was via a client magic-link, expire the specific request:

sql

UPDATE cycle_requests
SET expires_at = NOW()
WHERE id = '<suspected-request-id>';

3. Open a critical incident channel

#incident-tenant-leak-<YYYYMMDD> in Slack. Loop in the Engineering lead.

Investigation

1. Grep audit events

sql

-- All cross-tenant reads by the suspect actor in the last 24h
SELECT
  ae.occurred_at,
  ae.event_type,
  ae.actor_id,
  ae.client_id AS accessed_client,
  ae.entity_type,
  ae.entity_id,
  ae.event_data_json
FROM audit_events ae
WHERE ae.actor_id = '<staff_user_or_contact_id>'
  AND ae.occurred_at > NOW() - INTERVAL '24 hours'
  AND ae.event_type IN ('cycle.read', 'file.downloaded', 'cross_tenant_read')
ORDER BY ae.occurred_at DESC;

2. Check the route

Every non-public route is tagged routeScope: 'staff-only' or client-scoped (enforced by route-coverage.test.ts). Verify the suspected leaky route has the correct scope:

bash

grep -r "routeScope" apps/api/src/routes/$SUSPECT_ROUTE

3. Re-run the route-coverage test locally against the committed code

bash

pnpm --filter @breezycorp/api vitest run src/__tests__/route-coverage.test.ts

Must pass. If it doesn't, a route is undeclared — that's the bug.

4. Check for a Prisma extension bypass

The tenant-scope extension in packages/db/src/extensions/tenant-scope.ts only catches findMany/updateMany/deleteMany/count. findUnique/findFirst are bypassed by design (the route handler's assertClientTenant is responsible).

Search for any repo method that fetches a row by id without a subsequent scope check:

bash

grep -rn "findUnique\|findFirst" apps/api/src/routes | grep -v assertClientTenant

Containment

Once the root cause is identified:

A. Patch + deploy

Add the missing scope check
Write a regression test in tenant-isolation.test.ts
Fast-track through CI (priority merge)

B. If patch will take > 1 hour

Temporarily take the affected route offline by returning 503 for client-scoped requests until the patch lands
Announce degradation in #spade-ops

Disclosure

If confirmed leak:

Engineering lead decides on the notification scope (usually: affected clients + all clients if the issue was systemic)
Draft notification template: docs/templates/tenant-leak-disclosure.md (TODO — create during first incident)
Regulatory: check PDPA obligations (Singapore Personal Data Protection Act) for breach notification timelines
Post-mortem: blameless, published within 72 hours of resolution

apps/api/src/plugins/require-client-scope.ts
apps/api/src/plugins/scope-enforcer.ts
packages/db/src/extensions/tenant-scope.ts
apps/api/src/__tests__/tenant-isolation.test.ts — add regression cases here
apps/api/src/__tests__/route-coverage.test.ts — CI gate

Runbook: Suspected Tenant Leak ​

Immediate actions (first 15 min) ​

1. Preserve evidence ​

2. Revoke sessions for any potentially affected staff users ​

3. Open a critical incident channel ​

Investigation ​

1. Grep audit events ​

2. Check the route ​

3. Re-run the route-coverage test locally against the committed code ​

4. Check for a Prisma extension bypass ​

Containment ​

A. Patch + deploy ​

B. If patch will take > 1 hour ​

Disclosure ​

Related ​

Runbook: Suspected Tenant Leak

Immediate actions (first 15 min)

1. Preserve evidence

2. Revoke sessions for any potentially affected staff users

3. Open a critical incident channel

Investigation

1. Grep audit events

2. Check the route

3. Re-run the route-coverage test locally against the committed code

4. Check for a Prisma extension bypass

Containment

A. Patch + deploy

B. If patch will take > 1 hour

Disclosure

Related