Skip to content

Post-MVP Backlog

Audit of work that was listed in the old README "Roadmap / Remaining Work" section, cross-checked against the current codebase as of commit b9d73e9. Each item below records:

  • Current state — what exists in code today
  • Missing — what's genuinely not implemented yet
  • Where it lives / would live — file paths
  • Dependencies — blockers or prerequisites
  • Effort — rough implementation estimate

The original roadmap lived inline in README.md and referenced phase numbers (3I / 3G / 3H-Ext / 3J) that mapped to a planning doc no longer in the repo. This file drops the phase terminology and organises the remaining work by the capability gap it addresses.


1. Observability — the biggest gap

None of the observability items landed in MVP. docs/on-call.md defines alerting thresholds that assume metrics are being emitted, so those alerts fire into thin air today. This is the highest-priority block before running BreezyCorp in production.

1.1 Sentry error reporting

  • Current state: No @sentry/* package installed in the monorepo (verified: package.json, apps/api/package.json, apps/worker/package.json, apps/web/package.json).
  • Missing: @sentry/node init in apps/api/src/app.ts and apps/worker/src/index.ts, @sentry/nextjs in apps/web, beforeSend PII scrubbing (cross-referenced with docs/pii-inventory.md), filtering of expected business errors (TenantScopeViolationError, validation failures).
  • Where: new plugin apps/api/src/plugins/sentry.ts, wrapper in apps/worker/src/lib/, sentry.client.config.ts + sentry.server.config.ts in apps/web.
  • Dependencies: SENTRY_DSN production secret provisioning.
  • Effort: ~1 day.

1.2 Prometheus metrics

  • Current state: docs/on-call.md references metrics like spade_cycles_by_status, spade_outbox_pending, spade_outbox_dead_letter_count, spade_ocr_failures — none of which are emitted anywhere in code (grep returns zero matches across apps/, packages/).
  • Missing:
    • @fastify/metrics plugin for HTTP request/response histograms.
    • Custom prom-client gauges for the four referenced business metrics — updated on cycle transitions (CycleService), outbox poll (worker), and OCR worker callbacks.
    • /metrics endpoint exposed via a staff-only route (not public).
  • Where: new apps/api/src/plugins/metrics.ts, apps/worker/src/lib/metrics.ts, and gauge update points in packages/domain/src/services/cycle.service.ts + apps/worker/src/handlers/outbox-poll.ts.
  • Dependencies: none beyond the dependency add.
  • Effort: ~1 day.

1.3 Request ID + actor context on logs

  • Current state: Pino is configured (apps/api/src/app.ts) with PII redaction patterns (confirmed via '*.mfa_secret' match at line 66), and Fastify emits a per-request reqId, but there's no logger.child({ actorId, clientId, cycleId }) wiring on the auth hook. Routes log without actor context today.
  • Missing: a preHandler hook in apps/api/src/plugins/auth.ts (or a new request-context.ts plugin — already exists with basic context but not logger-bound) that creates a child logger with actorId, role, clientId, and attaches it to request.log.
  • Where: extend apps/api/src/plugins/request-context.ts (already registered — see apps/api/src/__tests__/portal.test.ts imports).
  • Dependencies: none.
  • Effort: ~0.5 day.

1.4 Grafana dashboard JSON

  • Current state: no Grafana templates committed.
  • Missing: a committed dashboard JSON (cycle backlog, issue backlog, OCR failure rate, outbox lag, approval turnaround) under infrastructure/grafana/ or similar.
  • Where: new infrastructure/grafana/spade-overview.json.
  • Dependencies: 1.2 Prometheus metrics must land first — a dashboard without the metrics is inert.
  • Effort: ~0.5 day.

2. Validation engine — depth vs breadth

The MVP validation engine works but only compares payroll-period-to-period variance on grossPay. The data model already supports more.

2.1 Multi-metric variance engine

  • Current state: EmployeeShadowSnapshot (schema.prisma) already stores netPay, cpfEmployee, cpfEmployer, ytdOrdinaryWage, ytdAdditionalWage — the baseline data is captured. But SnapshotService.computeVariance() at packages/domain/src/services/snapshot.service.ts:92 only accepts { employeeRef, grossPay } in its currentTotals parameter and produces a single aggregateVariance number. ValidationRules has one rule (PAYROLL_VARIANCE_THRESHOLD at validation-rules.ts:151) that gates on that aggregate.
  • Missing:
    • Widen the currentTotals shape to include netPay, cpfEmployee, variable-pay fields.
    • Emit per-metric EmployeeVariance rows (one per { employeeRef, metric } pair).
    • Add per-metric POL-gated rules: NET_PAY_VARIANCE_THRESHOLD, CPF_VARIANCE_THRESHOLD, VARIABLE_PAY_VARIANCE_THRESHOLD.
    • Wire the validation-run worker handler (apps/worker/src/handlers/validation-run.ts:94) to collect the extended totals from parsed output rows.
  • Where: packages/domain/src/services/snapshot.service.ts, packages/domain/src/rules/validation-rules.ts, apps/worker/src/handlers/validation-run.ts, apps/api/src/routes/ops/index.ts:1291 (existing caller).
  • Dependencies: 2.2 per-client config is how thresholds become per-client tunable; without that, thresholds stay as module constants.
  • Effort: 1–1.5 days.

2.2 Per-client validation config

  • Current state: no ClientValidationConfig model in packages/db/prisma/schema.prisma. Thresholds are hardcoded in validation-rules.ts.
  • Missing:
    • New Prisma model ClientValidationConfig keyed on clientId with columns for each threshold (gross variance %, net variance %, CPF variance %, etc.) + a JSON bag for future metrics.
    • Migration under packages/db/prisma/migrations/.
    • PUT /admin/clients/:id/validation-config endpoint at apps/api/src/routes/admin/index.ts.
    • Rule evaluator injected with the per-client config instead of module constants.
    • Fallback behaviour — clients without a config row use the embedded defaults.
  • Where: packages/db/prisma/schema.prisma + new migration; packages/domain/src/services/validation.service.ts; apps/api/src/routes/admin/index.ts.
  • Dependencies: unblocks 2.1 (proper thresholds) and 3.1 (UI to edit them).
  • Effort: ~1 day.

2.3 POL reason-code taxonomy

  • Current state: ResolveIssueBody.reasonCode at apps/api/src/routes/ops/index.ts:149–153 is Type.String() — free-form. Only packages/domain/src/services/resolved-issue-overlay.ts:20–25 knows about the closed set { 'DOCS_PROVIDED_OOB', 'CLIENT_CONFIRMED', 'NOT_APPLICABLE' }. Any other string is accepted by the API and classified as a WARNING downgrade rather than a PASS.
  • Missing:
    • Replace the free-form schema with Type.Union([Type.Literal(...)]) enumerating the accepted codes — pull the set from a shared packages/contracts/src/enums/issue-resolution-reason.ts.
    • Add OTHER as a closed value and require reasonText (min length 20) for it so ops can't rubber-stamp issues with vague notes.
    • Backfill migration for any existing WorkflowIssue.reasonCode rows with non-taxonomy values.
  • Where: apps/api/src/routes/ops/index.ts, packages/contracts/src/enums/, packages/db/prisma/migrations/.
  • Dependencies: none.
  • Effort: ~0.5 day.

3. Admin UI

3.1 Per-client validation config editor

  • Current state: No apps/web/src/app/admin/clients/[id]/validation* page exists (verified).
  • Missing: a form at /admin/clients/[id]/validation-config with a numeric input per threshold and a "reset to defaults" button.
  • Where: new apps/web/src/app/admin/clients/[id]/validation-config/page.tsx + a features/validation-config/ feature folder.
  • Dependencies: 2.2 per-client config endpoint must ship first.
  • Effort: ~1 day.

4. E2E tests

  • Current state: apps/web/e2e/ has two specs — navigation.spec.ts (95 lines) and pages.spec.ts (108 lines) — with @playwright/test at ^1.59.1 installed. These cover landing-page navigation and basic page loads. There is no full-cycle journey test.
  • Missing: a Playwright scenario that walks: staff login → MFA (if enabled) → create client → issue cycle request → portal magic-link open → upload files → declare → submit → ops validate → export → approve → close → archive.
  • Where: new apps/web/e2e/full-cycle.spec.ts, seeded via packages/db/prisma/seed.ts (which already produces ACME + GLOBEX fixtures).
  • Dependencies: none — the old "broken Playwright version mismatch" story no longer applies (verified clean install).
  • Effort: 2–3 days (realistic, given OCR + email assertions via Mailpit).

5. Pre-launch operational work

None of the items below are code-only; each requires a human owner on the Platform / Engineering / Legal side.

5.1 Scripts that don't yet exist

ScriptPurposeEffort
scripts/reencrypt-mfa-secrets.tsRe-wrap staff_users.mfa_secret ciphertext with a new MFA_SECRET_KEY during key rotation. The key is already used (apps/api/src/routes/staff-auth/index.ts:29, apps/api/src/__tests__/staff-auth.test.ts) via encryptAtRest.~0.5 day
scripts/gdpr-export.tsPDPA DSAR subject-access export — gather all rows for a given contact/employee into a zip. PII inventory at docs/pii-inventory.md lists the surface.~1 day
scripts/gdpr-delete.tsPDPA DSAR erasure — hard-delete contact + submission history where retention permits. Must interact with the existing retention handler at apps/worker/src/handlers/retention-purge.ts.~1 day

5.2 Templates / docs missing

  • docs/templates/postmortem.md — empty today; confirmed missing. Create on first real incident so the format is grounded in actual context.

5.3 Non-code owners

ItemOwnerBlocking for production?
Legal review of Singapore retention values in docs/retention-policy.mdLegal / DPOYes — PDPA compliance.
First quarterly restore drill per docs/runbooks/restore-drill.mdPlatform on-callSoft — should happen before launch but not a release blocker.
First load test run in staging to calibrate docs/capacity.mdPlatformYes — capacity planning unverified otherwise.
Production secret provisioning (MFA_SECRET_KEY, ANTHROPIC_API_KEY, GOOGLE_VISION_CREDENTIALS_JSON, JWT_SECRET, MAGIC_LINK_SECRET, CORS_ORIGINS, SENTRY_DSN)PlatformYes.

6. Explicitly deferred (not in this backlog)

Listed here so nobody re-derives them from first principles and opens a ticket. Each of these was a conscious MVP-scope decision:

  • Real Infotech payroll engine API integration — stays manual workbook upload/download.
  • i18n / multi-language support — English only.
  • WCAG 2.1 accessibility audit.
  • API versioning (/v1/) — flat namespace for launch.
  • GraphQL / BFF layer — TypeBox-backed REST is the contract.
  • Usage metering / billing.
  • Soft-delete semantics — retention handles hard deletes on archived cycles.
  • PII column-level encryption at rest — relies on provider-level encryption + Pino redaction (the mfa_secret column uses app-level encryption as a special case).

Rough order-of-operations for a production push

  1. Observability (§1.1–§1.3) — without these, any production incident is debugged blind. ~2.5 days.
  2. POL reason-code taxonomy (§2.3) — cheap, removes an audit-trail ambiguity. ~0.5 day.
  3. Pre-launch scripts (§5.1) — MFA rotation + GDPR scripts are launch blockers once the first real user enrols. ~2.5 days.
  4. E2E smoke (§4) — one happy-path Playwright test buys a lot of regression safety. ~2 days.
  5. Per-client validation config + editor (§2.2 + §3.1) — can ship together once a client asks. ~2 days.
  6. Multi-metric variance (§2.1) — gated by §2.2 if thresholds should be per-client. ~1.5 days.
  7. Grafana dashboard (§1.4) — once metrics are live. ~0.5 day.

Rough total: ~11–12 engineering days for the code-only items, plus non-code owners for the §5.3 line items.

Internal use only — BreezyCorp