Post-MVP Backlog
Audit of work that was listed in the old README "Roadmap / Remaining Work" section, cross-checked against the current codebase as of commit b9d73e9. Each item below records:
- Current state — what exists in code today
- Missing — what's genuinely not implemented yet
- Where it lives / would live — file paths
- Dependencies — blockers or prerequisites
- Effort — rough implementation estimate
The original roadmap lived inline in README.md and referenced phase numbers (3I / 3G / 3H-Ext / 3J) that mapped to a planning doc no longer in the repo. This file drops the phase terminology and organises the remaining work by the capability gap it addresses.
1. Observability — the biggest gap
None of the observability items landed in MVP. docs/on-call.md defines alerting thresholds that assume metrics are being emitted, so those alerts fire into thin air today. This is the highest-priority block before running BreezyCorp in production.
1.1 Sentry error reporting
- Current state: No
@sentry/*package installed in the monorepo (verified:package.json,apps/api/package.json,apps/worker/package.json,apps/web/package.json). - Missing:
@sentry/nodeinit inapps/api/src/app.tsandapps/worker/src/index.ts,@sentry/nextjsinapps/web,beforeSendPII scrubbing (cross-referenced withdocs/pii-inventory.md), filtering of expected business errors (TenantScopeViolationError, validation failures). - Where: new plugin
apps/api/src/plugins/sentry.ts, wrapper inapps/worker/src/lib/,sentry.client.config.ts+sentry.server.config.tsinapps/web. - Dependencies:
SENTRY_DSNproduction secret provisioning. - Effort: ~1 day.
1.2 Prometheus metrics
- Current state:
docs/on-call.mdreferences metrics likespade_cycles_by_status,spade_outbox_pending,spade_outbox_dead_letter_count,spade_ocr_failures— none of which are emitted anywhere in code (grep returns zero matches acrossapps/,packages/). - Missing:
@fastify/metricsplugin for HTTP request/response histograms.- Custom
prom-clientgauges for the four referenced business metrics — updated on cycle transitions (CycleService), outbox poll (worker), and OCR worker callbacks. /metricsendpoint exposed via a staff-only route (not public).
- Where: new
apps/api/src/plugins/metrics.ts,apps/worker/src/lib/metrics.ts, and gauge update points inpackages/domain/src/services/cycle.service.ts+apps/worker/src/handlers/outbox-poll.ts. - Dependencies: none beyond the dependency add.
- Effort: ~1 day.
1.3 Request ID + actor context on logs
- Current state: Pino is configured (
apps/api/src/app.ts) with PII redaction patterns (confirmed via'*.mfa_secret'match at line 66), and Fastify emits a per-requestreqId, but there's nologger.child({ actorId, clientId, cycleId })wiring on the auth hook. Routes log without actor context today. - Missing: a
preHandlerhook inapps/api/src/plugins/auth.ts(or a newrequest-context.tsplugin — already exists with basic context but not logger-bound) that creates a child logger withactorId,role,clientId, and attaches it torequest.log. - Where: extend
apps/api/src/plugins/request-context.ts(already registered — seeapps/api/src/__tests__/portal.test.tsimports). - Dependencies: none.
- Effort: ~0.5 day.
1.4 Grafana dashboard JSON
- Current state: no Grafana templates committed.
- Missing: a committed dashboard JSON (cycle backlog, issue backlog, OCR failure rate, outbox lag, approval turnaround) under
infrastructure/grafana/or similar. - Where: new
infrastructure/grafana/spade-overview.json. - Dependencies: 1.2 Prometheus metrics must land first — a dashboard without the metrics is inert.
- Effort: ~0.5 day.
2. Validation engine — depth vs breadth
The MVP validation engine works but only compares payroll-period-to-period variance on grossPay. The data model already supports more.
2.1 Multi-metric variance engine
- Current state:
EmployeeShadowSnapshot(schema.prisma) already storesnetPay,cpfEmployee,cpfEmployer,ytdOrdinaryWage,ytdAdditionalWage— the baseline data is captured. ButSnapshotService.computeVariance()atpackages/domain/src/services/snapshot.service.ts:92only accepts{ employeeRef, grossPay }in itscurrentTotalsparameter and produces a singleaggregateVariancenumber.ValidationRuleshas one rule (PAYROLL_VARIANCE_THRESHOLDatvalidation-rules.ts:151) that gates on that aggregate. - Missing:
- Widen the
currentTotalsshape to includenetPay,cpfEmployee, variable-pay fields. - Emit per-metric
EmployeeVariancerows (one per{ employeeRef, metric }pair). - Add per-metric POL-gated rules:
NET_PAY_VARIANCE_THRESHOLD,CPF_VARIANCE_THRESHOLD,VARIABLE_PAY_VARIANCE_THRESHOLD. - Wire the
validation-runworker handler (apps/worker/src/handlers/validation-run.ts:94) to collect the extended totals from parsed output rows.
- Widen the
- Where:
packages/domain/src/services/snapshot.service.ts,packages/domain/src/rules/validation-rules.ts,apps/worker/src/handlers/validation-run.ts,apps/api/src/routes/ops/index.ts:1291(existing caller). - Dependencies: 2.2 per-client config is how thresholds become per-client tunable; without that, thresholds stay as module constants.
- Effort: 1–1.5 days.
2.2 Per-client validation config
- Current state: no
ClientValidationConfigmodel inpackages/db/prisma/schema.prisma. Thresholds are hardcoded invalidation-rules.ts. - Missing:
- New Prisma model
ClientValidationConfigkeyed onclientIdwith columns for each threshold (gross variance %, net variance %, CPF variance %, etc.) + a JSON bag for future metrics. - Migration under
packages/db/prisma/migrations/. PUT /admin/clients/:id/validation-configendpoint atapps/api/src/routes/admin/index.ts.- Rule evaluator injected with the per-client config instead of module constants.
- Fallback behaviour — clients without a config row use the embedded defaults.
- New Prisma model
- Where:
packages/db/prisma/schema.prisma+ new migration;packages/domain/src/services/validation.service.ts;apps/api/src/routes/admin/index.ts. - Dependencies: unblocks 2.1 (proper thresholds) and 3.1 (UI to edit them).
- Effort: ~1 day.
2.3 POL reason-code taxonomy
- Current state:
ResolveIssueBody.reasonCodeatapps/api/src/routes/ops/index.ts:149–153isType.String()— free-form. Onlypackages/domain/src/services/resolved-issue-overlay.ts:20–25knows about the closed set{ 'DOCS_PROVIDED_OOB', 'CLIENT_CONFIRMED', 'NOT_APPLICABLE' }. Any other string is accepted by the API and classified as a WARNING downgrade rather than a PASS. - Missing:
- Replace the free-form schema with
Type.Union([Type.Literal(...)])enumerating the accepted codes — pull the set from a sharedpackages/contracts/src/enums/issue-resolution-reason.ts. - Add
OTHERas a closed value and requirereasonText(min length 20) for it so ops can't rubber-stamp issues with vague notes. - Backfill migration for any existing
WorkflowIssue.reasonCoderows with non-taxonomy values.
- Replace the free-form schema with
- Where:
apps/api/src/routes/ops/index.ts,packages/contracts/src/enums/,packages/db/prisma/migrations/. - Dependencies: none.
- Effort: ~0.5 day.
3. Admin UI
3.1 Per-client validation config editor
- Current state: No
apps/web/src/app/admin/clients/[id]/validation*page exists (verified). - Missing: a form at
/admin/clients/[id]/validation-configwith a numeric input per threshold and a "reset to defaults" button. - Where: new
apps/web/src/app/admin/clients/[id]/validation-config/page.tsx+ afeatures/validation-config/feature folder. - Dependencies: 2.2 per-client config endpoint must ship first.
- Effort: ~1 day.
4. E2E tests
- Current state:
apps/web/e2e/has two specs —navigation.spec.ts(95 lines) andpages.spec.ts(108 lines) — with@playwright/testat^1.59.1installed. These cover landing-page navigation and basic page loads. There is no full-cycle journey test. - Missing: a Playwright scenario that walks: staff login → MFA (if enabled) → create client → issue cycle request → portal magic-link open → upload files → declare → submit → ops validate → export → approve → close → archive.
- Where: new
apps/web/e2e/full-cycle.spec.ts, seeded viapackages/db/prisma/seed.ts(which already produces ACME + GLOBEX fixtures). - Dependencies: none — the old "broken Playwright version mismatch" story no longer applies (verified clean install).
- Effort: 2–3 days (realistic, given OCR + email assertions via Mailpit).
5. Pre-launch operational work
None of the items below are code-only; each requires a human owner on the Platform / Engineering / Legal side.
5.1 Scripts that don't yet exist
| Script | Purpose | Effort |
|---|---|---|
scripts/reencrypt-mfa-secrets.ts | Re-wrap staff_users.mfa_secret ciphertext with a new MFA_SECRET_KEY during key rotation. The key is already used (apps/api/src/routes/staff-auth/index.ts:29, apps/api/src/__tests__/staff-auth.test.ts) via encryptAtRest. | ~0.5 day |
scripts/gdpr-export.ts | PDPA DSAR subject-access export — gather all rows for a given contact/employee into a zip. PII inventory at docs/pii-inventory.md lists the surface. | ~1 day |
scripts/gdpr-delete.ts | PDPA DSAR erasure — hard-delete contact + submission history where retention permits. Must interact with the existing retention handler at apps/worker/src/handlers/retention-purge.ts. | ~1 day |
5.2 Templates / docs missing
docs/templates/postmortem.md— empty today; confirmed missing. Create on first real incident so the format is grounded in actual context.
5.3 Non-code owners
| Item | Owner | Blocking for production? |
|---|---|---|
Legal review of Singapore retention values in docs/retention-policy.md | Legal / DPO | Yes — PDPA compliance. |
First quarterly restore drill per docs/runbooks/restore-drill.md | Platform on-call | Soft — should happen before launch but not a release blocker. |
First load test run in staging to calibrate docs/capacity.md | Platform | Yes — capacity planning unverified otherwise. |
Production secret provisioning (MFA_SECRET_KEY, ANTHROPIC_API_KEY, GOOGLE_VISION_CREDENTIALS_JSON, JWT_SECRET, MAGIC_LINK_SECRET, CORS_ORIGINS, SENTRY_DSN) | Platform | Yes. |
6. Explicitly deferred (not in this backlog)
Listed here so nobody re-derives them from first principles and opens a ticket. Each of these was a conscious MVP-scope decision:
- Real Infotech payroll engine API integration — stays manual workbook upload/download.
- i18n / multi-language support — English only.
- WCAG 2.1 accessibility audit.
- API versioning (
/v1/) — flat namespace for launch. - GraphQL / BFF layer — TypeBox-backed REST is the contract.
- Usage metering / billing.
- Soft-delete semantics — retention handles hard deletes on archived cycles.
- PII column-level encryption at rest — relies on provider-level encryption + Pino redaction (the
mfa_secretcolumn uses app-level encryption as a special case).
Rough order-of-operations for a production push
- Observability (§1.1–§1.3) — without these, any production incident is debugged blind. ~2.5 days.
- POL reason-code taxonomy (§2.3) — cheap, removes an audit-trail ambiguity. ~0.5 day.
- Pre-launch scripts (§5.1) — MFA rotation + GDPR scripts are launch blockers once the first real user enrols. ~2.5 days.
- E2E smoke (§4) — one happy-path Playwright test buys a lot of regression safety. ~2 days.
- Per-client validation config + editor (§2.2 + §3.1) — can ship together once a client asks. ~2 days.
- Multi-metric variance (§2.1) — gated by §2.2 if thresholds should be per-client. ~1.5 days.
- Grafana dashboard (§1.4) — once metrics are live. ~0.5 day.
Rough total: ~11–12 engineering days for the code-only items, plus non-code owners for the §5.3 line items.