Skip to content

B-03 · Ingest a document — manual upload path

SOP: SOP_AI_Bookkeeping_Automation.md §3 + §4.1 (OCR & data extraction)Actors: Bookkeeper (BOOKKEEPER role) or PLATFORM_ADMIN. Pre-state: Bookkeeping-enabled client with at least one CoA entry and one vendor master. Post-state: A BookkeepingDocument row, an OCR DocumentExtraction row, and a JournalEntry in the current month's JournalBatch (auto-created if missing).

This exercises the simplest end-to-end ingestion flow for testing — manual file upload from the ops dashboard.

0. Prerequisites

  • Bookkeeping-enabled client (use seeded ZENITH for the warmest start, or the client created in B-01).
  • Logged in as BOOKKEEPER / SENIOR_ACCOUNTANT / PLATFORM_ADMIN.

1. Steps

1.1 Open the documents page

Web: /dashboard/bookkeeping/documents (or the Documents tab on a specific client). Click Upload document.

1.2 Upload via presign / PUT / finalize

The dialog reuses the standard 3-call upload ceremony:

http
POST /ops/files/presign
{ "originalName": "uber-receipt.pdf", "fileKind": "BOOKKEEPING_DOCUMENT", "mimeType": "application/pdf", "sizeBytes": 12345 }

PUT <presigned MinIO URL> ...

POST /ops/files/<fileId>/finalize
{ "sha256": "<hex>", "sizeBytes": 12345 }

After finalize, the route enqueues bookkeeping.ingest-document with { clientId, fileId, sourceChannel: 'CLIENT_PORTAL' or 'MANUAL_UPLOAD' }.

1.3 Worker pipeline

  1. bookkeeping.ingest-document creates a BookkeepingDocument row referencing the file. Enqueues ocr-process.
  2. ocr-process (shared with payroll) runs the OCR adapter:
    • Mock adapter (default in dev) returns hand-picked extraction fields.
    • Production adapter runs Google Vision documentTextDetection then Claude tool-use against the per-doc-type JSON schema (invoice / receipt / bill / bank statement).
    • Persists DocumentExtraction + ExtractedField rows with confidence scores.
  3. bookkeeping.classify-and-draft waits for the extraction to complete, then calls ClassificationService (priority order per SOP §4.2):
    1. Vendor master exact / alias match → use defaultAccountCode. Confidence HIGH. Status DRAFT (or auto-APPROVED if reviewer bypass is configured per client — default off).
    2. Line-item / description text match against ChartOfAccountsEntry.keywords → infer account. Confidence MEDIUM. Flag KEYWORD_MATCH_ONLY.
    3. Document type → category fallback (e.g. restaurant → Entertainment). Confidence LOW.
    4. None of the above → UNCLASSIFIED, flag NO_VENDOR_MATCH + REVIEWER_ATTENTION_REQUIRED. The entry is created in FLAGGED so the reviewer can't miss it.
  4. Batch resolution. The handler resolves the current JournalBatch for (clientId, period = current month, accountingPlatform = client.bookkeepingConfig.platform). If none exists, it creates one in DRAFT.
  5. Draft. A JournalEntry is persisted with debit/credit accounts, FX rate (from bookkeepingConfig.fxRateSource or "MANUAL" / null), classification confidence, and any flags.

1.4 Inspect the result

Open /dashboard/bookkeeping/batches for the client. The current month's batch should now have one more entry. Click into the batch to see the row.

2. Verification

Database

sql
SELECT id, source_channel, document_type, batch_id, ingested_at
  FROM bookkeeping_documents WHERE client_id = '<clientId>'
  ORDER BY ingested_at DESC LIMIT 1;

SELECT status, classification_confidence, debit_account, credit_account, flags
  FROM journal_entries WHERE document_id = '<bookkeepingDocumentId>';

S3 / MinIO

<clientCode>/bookkeeping/<period>/<originalName> carries the raw upload (immutable).

Audit log

file.finalized                 fileKind=BOOKKEEPING_DOCUMENT
bookkeeping.document.ingested
ocr.completed
bookkeeping.journal_entry.drafted   batchId=<id> entryId=<id> confidence=HIGH

Worker logs

bookkeeping.ingest-document    clientId=<id> fileId=<id>
bookkeeping.classify-and-draft batchId=<id> entryId=<id> route=VENDOR_MASTER classification=HIGH

3. Negative & edge cases

  • No vendor master + no keyword hit → entry created UNCLASSIFIED, flags NO_VENDOR_MATCH + REVIEWER_ATTENTION_REQUIRED. Status FLAGGED. (Mirrors the seeded IRAS GST entry.)
  • OCR fails entirely (e.g. encrypted PDF) → DocumentExtraction.status = FAILED. The classify-and-draft handler refuses to run; a placeholder BookkeepingDocument row stays without a paired JournalEntry. A bookkeeping.document.ocr_failed audit event is written.
  • Duplicate document hash → handler dedupes by SHA-256 hash. The second upload returns the existing BookkeepingDocument id and writes a bookkeeping.document.duplicate_rejected audit event.
  • Multi-currency — if currency != bookkeepingConfig.baseCurrency, the entry is persisted with currency, exchangeRate, exchangeRateSource, and totalAmount calculated in base currency. Flag NON_BASE_CURRENCY is added. (Mirrors the seeded Figma entry: USD 75 @ 1.35 → SGD 101.25.)
  • OCR confidence below threshold for ≥ 3 fields (per SOP §7) → entry flagged LOW_CONFIDENCE_OCR. Pipeline does not auto-classify; falls back to UNCLASSIFIED.

Next

Proceed to B-05 · Review a journal batch. To exercise the WhatsApp or folder paths, see B-04.

Internal use only — BreezyCorp