Skip to content

Capacity Planning

Initial sizing derived from the Phase 3K.4 load test (tests/load/month-end-burst.js). Update this doc after each run of the load test in staging.

Reference workload

  • 50 clients with payroll cycles
  • Month-end burst: 80% of monthly submissions arrive within a 2-hour window on the 1st of the month
  • Files per cycle: ~20 documents averaging 2 MB each
  • Peak concurrent API requests: ~50 RPS during the burst, ~5 RPS off-peak

API sizing (initial)

SettingValueRationale
Replicas2Redundancy + rolling deploys
CPU request / limit250m / 1000mFastify is cheap; burst headroom for OCR coordination
Memory request / limit256Mi / 768MiPrisma client + ExcelJS render footprint
Readiness probeEvery 10sDrives load balancer rotation during restart

Worker sizing (initial)

SettingValueRationale
Replicas2pg-boss supports multi-worker out of the box
CPU request / limit500m / 2000mOCR + xlsx parsing are CPU-bound
Memory request / limit512Mi / 1.5GiExcel workbooks can be large in memory
pg-boss teamSize.ocr-process4Per worker — 8 total concurrent OCR jobs
pg-boss teamSize.reminder-email2Lower priority
pg-boss teamSize.outbox-poll1Singleton; pg-boss dedupes
pg-boss teamSize.validation-run2Per worker — 4 total; validation is fast

Database (Postgres)

SettingValueRationale
Instance classdb.t3.medium (or equivalent) initially2 vCPU / 4 GB; upgrade based on CPU/connections metrics
max_connections100API 2 × 10 + worker 2 × 20 + buffer
PgBouncer pool (transaction mode)50 per app podKeeps the raw connection count low
Storage50 GB with auto-grow20% buffer over current

S3 (file uploads)

No sizing — pay-per-use. Monitor the monthly transfer cost metric for unexpected spikes (a leak or a pentest worth noting).

Known bottlenecks (to monitor)

  1. pg-boss job queue depth — if it builds up during the burst, raise teamSize.ocr-process
  2. Postgres connection saturation — alert at 80% of max_connections
  3. ExcelJS memory during large output parse — 100+ MB workbooks can OOM the worker; watch the container memory usage during month-end

How to re-run the load test

bash
# 1. Spin up a staging environment with production-like data
terraform apply -workspace=staging

# 2. Point k6 at it
BASE_URL=https://staging-api.spade \
STAFF_EMAIL=admin@spade.local \
STAFF_PASSWORD="$(get-secret staging-admin-pass)" \
  k6 run tests/load/month-end-burst.js

# 3. Compare the results against the SLO thresholds in the test file
# 4. If any threshold fails, update the sizing above BEFORE merging to main

When to update this doc

  • After every load test run (update the observed numbers)
  • When adding significant client load (each new client ≈ 50 more submissions/month)
  • After any major schema change (ExcelJS / Prisma footprint shifts)
  • When an incident reveals an unknown bottleneck

Internal use only — BreezyCorp