PII Inventory
Every column in the BreezyCorp database that holds personally identifiable information, classified per the Singapore PDPA. This is the source of truth for:
- Pino log redaction config (
apps/api/src/app.ts) - Sentry
beforeSendscrub rules (Phase 3G, post-MVP) - Access audit queries during a tenant-leak investigation
- DPIA / data subject access requests
Classification levels (increasingly sensitive):
- PUBLIC — no access control needed
- INTERNAL — Spade employees only
- CONFIDENTIAL — client + Spade ops only; logged reads
- RESTRICTED — encrypted at rest; logged reads + writes
Staff users
| Column | Classification | At rest | Redacted in logs | Notes |
|---|---|---|---|---|
staff_users.email | CONFIDENTIAL | plain | yes (regex in breadcrumbs) | Used for login |
staff_users.name | CONFIDENTIAL | plain | no | Displayed in audit trail |
staff_users.password_hash | RESTRICTED | bcrypt (cost 12) | yes (*.password*) | Not reversible |
staff_users.mfa_secret | RESTRICTED | AES-256-GCM | yes (*.mfaSecret, *.mfa_secret) | Encrypted with MFA_SECRET_KEY |
staff_sessions.token_hash | RESTRICTED | SHA-256 | yes (*.token*) | Raw token never persisted |
staff_sessions.csrf_token | CONFIDENTIAL | plain | yes (req.headers["x-csrf-token"]) | Double-submit pattern |
Client contacts (portal users)
| Column | Classification | At rest | Redacted in logs | Notes |
|---|---|---|---|---|
client_contacts.email | CONFIDENTIAL | plain | yes (regex) | Used for magic-link delivery |
client_contacts.name | CONFIDENTIAL | plain | no | Displayed in audit |
cycle_requests.token_hash | RESTRICTED | SHA-256 | yes | Magic-link token hashed |
cycle_requests.recipient_email | CONFIDENTIAL | plain | yes (regex) | — |
Employee data (the high-sensitivity tier)
| Column | Classification | At rest | Redacted in logs | Notes |
|---|---|---|---|---|
employee_shadow_snapshots.external_employee_id | CONFIDENTIAL | plain | no | Client-assigned ref; not globally unique |
employee_shadow_snapshots.employee_name | CONFIDENTIAL | plain | no | |
employee_shadow_snapshots.gross_pay | RESTRICTED | plain (DB-level encryption at rest via provider) | yes (*.grossPay, *.gross_pay) | Salary data |
employee_shadow_snapshots.net_pay | RESTRICTED | plain | yes (*.netPay, *.net_pay) | |
employee_shadow_snapshots.cpf_employee | RESTRICTED | plain | yes (*.cpfEmployee, *.cpf_employee) | CPF filing data |
employee_shadow_snapshots.cpf_employer | RESTRICTED | plain | yes | |
employee_shadow_snapshots.ytd_ordinary_wage | RESTRICTED | plain | yes | |
employee_shadow_snapshots.ytd_additional_wage | RESTRICTED | plain | yes | CPF AW ceiling calc |
employee_shadow_snapshots.is_foreign | CONFIDENTIAL | plain | no | IR21 trigger |
employee_shadow_snapshots.snapshot_json | RESTRICTED | plain JSONB | yes (whole blob) | Contains all of the above plus provider-specific fields |
Submission + payload data
| Column | Classification | At rest | Redacted in logs | Notes |
|---|---|---|---|---|
submissions.attestation_text | INTERNAL | plain | no | Free-text client attestation |
submissions.monthly_declaration | CONFIDENTIAL | plain JSONB | *.declarantName partial | Declarant name is PII |
submission_items.employee_ref | CONFIDENTIAL | plain | no | — |
submission_items.payload_json | RESTRICTED | plain JSONB | yes (whole blob) | Contains salary/NRIC/bank depending on change type |
Payload JSON fields commonly contain:
previousSalary,newSalary— RESTRICTEDbankAccount,bankCode— RESTRICTED (redacted as*.bank_account,*.bankAccount)nric,fin— RESTRICTED (redacted as*.nric,*.fin)reason,position— CONFIDENTIAL
Files (uploaded documents)
| Item | Classification | Notes |
|---|---|---|
| File bytes in S3 | RESTRICTED | Contains offer letters, payslips, IR21 forms — highly sensitive. Encrypted at rest via S3 SSE-KMS. |
files.original_name | CONFIDENTIAL | May leak identity via filename |
files.sha256 | INTERNAL | Content hash only |
document_classifications.provider_payload_json | RESTRICTED | OCR output may contain extracted PII |
extracted_fields.raw_value | RESTRICTED | OCR extractions — redacted via *.rawValue |
Audit events
| Column | Classification | At rest | Redacted in logs | Notes |
|---|---|---|---|---|
audit_events.event_data_json | Varies | plain | yes (selective — the outer blob is logged, sensitive fields inside are redacted via pino's deep-path config) | Contains context for each audit event |
audit_events.actor_id | INTERNAL | plain | no | Staff user id or 'portal' |
Audit events are retained 7 years and never purged by the retention job (see docs/retention-policy.md).
Cross-reference: redaction configs
Pino redact paths (in apps/api/src/app.ts)
Every RESTRICTED column above must have a corresponding redact path. Current config covers:
req.headers.authorization
req.headers.cookie
req.headers["x-csrf-token"]
req.headers["x-ocr-signature"]
res.headers["set-cookie"]
*.password *.passwordHash *.password_hash
*.oldPassword *.newPassword *.totpCode
*.mfaSecret *.mfa_secret *.sessionToken *.session_token
*.token *.tokenHash *.token_hash
*.csrfToken *.csrf_token *.secret
*.apiKey *.api_key
*.nric *.fin *.bankAccount *.bank_accountMissing from current redact config (TODO — wire these before post-MVP Sentry):
*.grossPay,*.gross_pay,*.netPay,*.net_pay*.cpfEmployee,*.cpf_employee,*.cpfEmployer,*.cpf_employer*.ytdOrdinaryWage,*.ytd_ordinary_wage,*.ytdAdditionalWage,*.ytd_additional_wage*.previousSalary,*.newSalary,*.salary*.snapshot_json,*.snapshotJson*.payload_json,*.payloadJson*.raw_value,*.rawValue
These should land when we do 3G (Sentry integration). Until then, log output in dev is verbose but production LOG_LEVEL=info keeps request bodies out of logs at the level anyway.
Data subject access / deletion requests (PDPA)
When a data subject (employee) requests access / deletion:
- Locate: query
employee_shadow_snapshots,submission_items,extracted_fieldsfor the employee ref - Scope: verify the employee is associated with the requesting client (tenant scope)
- Decide: for deletion requests under PDPA, check whether the record is still within the 5-year tax retention window — tax records cannot be deleted on request, they must age out
- Act: via
scripts/gdpr-export.ts/scripts/gdpr-delete.ts(TODO — build when first request lands) - Log: every DSAR access creates an
audit_eventsrow withevent_type = 'dsar.access'ordsar.delete
Review cadence
- Quarterly: engineering reviews this doc against the schema for drift
- Annually: legal + DPO review for regulatory change
- On schema change: any PR adding a column to
employee_shadow_snapshots,submission_items, orfilesmust update this inventory in the same commit