B-03 · Ingest a document — manual upload path
SOP:
SOP_AI_Bookkeeping_Automation.md§3 + §4.1 (OCR & data extraction)Actors: Bookkeeper (BOOKKEEPERrole) orPLATFORM_ADMIN. Pre-state: Bookkeeping-enabled client with at least one CoA entry and one vendor master. Post-state: ABookkeepingDocumentrow, an OCRDocumentExtractionrow, and aJournalEntryin the current month'sJournalBatch(auto-created if missing).
This exercises the simplest end-to-end ingestion flow for testing — manual file upload from the ops dashboard.
0. Prerequisites
- Bookkeeping-enabled client (use seeded ZENITH for the warmest start, or the client created in B-01).
- Logged in as
BOOKKEEPER/SENIOR_ACCOUNTANT/PLATFORM_ADMIN.
1. Steps
1.1 Open the documents page
Web: /dashboard/bookkeeping/documents (or the Documents tab on a specific client). Click Upload document.
1.2 Upload via presign / PUT / finalize
The dialog reuses the standard 3-call upload ceremony:
POST /ops/files/presign
{ "originalName": "uber-receipt.pdf", "fileKind": "BOOKKEEPING_DOCUMENT", "mimeType": "application/pdf", "sizeBytes": 12345 }
PUT <presigned MinIO URL> ...
POST /ops/files/<fileId>/finalize
{ "sha256": "<hex>", "sizeBytes": 12345 }After finalize, the route enqueues bookkeeping.ingest-document with { clientId, fileId, sourceChannel: 'CLIENT_PORTAL' or 'MANUAL_UPLOAD' }.
1.3 Worker pipeline
bookkeeping.ingest-documentcreates aBookkeepingDocumentrow referencing the file. Enqueuesocr-process.ocr-process(shared with payroll) runs the OCR adapter:- Mock adapter (default in dev) returns hand-picked extraction fields.
- Production adapter runs Google Vision
documentTextDetectionthen Claude tool-use against the per-doc-type JSON schema (invoice / receipt / bill / bank statement). - Persists
DocumentExtraction+ExtractedFieldrows with confidence scores.
bookkeeping.classify-and-draftwaits for the extraction to complete, then callsClassificationService(priority order per SOP §4.2):- Vendor master exact / alias match → use
defaultAccountCode. ConfidenceHIGH. StatusDRAFT(or auto-APPROVEDif reviewer bypass is configured per client — default off). - Line-item / description text match against
ChartOfAccountsEntry.keywords→ infer account. ConfidenceMEDIUM. FlagKEYWORD_MATCH_ONLY. - Document type → category fallback (e.g.
restaurant→ Entertainment). ConfidenceLOW. - None of the above →
UNCLASSIFIED, flagNO_VENDOR_MATCH+REVIEWER_ATTENTION_REQUIRED. The entry is created inFLAGGEDso the reviewer can't miss it.
- Vendor master exact / alias match → use
- Batch resolution. The handler resolves the current
JournalBatchfor(clientId, period = current month, accountingPlatform = client.bookkeepingConfig.platform). If none exists, it creates one inDRAFT. - Draft. A
JournalEntryis persisted with debit/credit accounts, FX rate (frombookkeepingConfig.fxRateSourceor "MANUAL" /null), classification confidence, and any flags.
1.4 Inspect the result
Open /dashboard/bookkeeping/batches for the client. The current month's batch should now have one more entry. Click into the batch to see the row.
2. Verification
Database
SELECT id, source_channel, document_type, batch_id, ingested_at
FROM bookkeeping_documents WHERE client_id = '<clientId>'
ORDER BY ingested_at DESC LIMIT 1;
SELECT status, classification_confidence, debit_account, credit_account, flags
FROM journal_entries WHERE document_id = '<bookkeepingDocumentId>';S3 / MinIO
<clientCode>/bookkeeping/<period>/<originalName> carries the raw upload (immutable).
Audit log
file.finalized fileKind=BOOKKEEPING_DOCUMENT
bookkeeping.document.ingested
ocr.completed
bookkeeping.journal_entry.drafted batchId=<id> entryId=<id> confidence=HIGHWorker logs
bookkeeping.ingest-document clientId=<id> fileId=<id>
bookkeeping.classify-and-draft batchId=<id> entryId=<id> route=VENDOR_MASTER classification=HIGH3. Negative & edge cases
- No vendor master + no keyword hit → entry created
UNCLASSIFIED, flagsNO_VENDOR_MATCH+REVIEWER_ATTENTION_REQUIRED. StatusFLAGGED. (Mirrors the seeded IRAS GST entry.) - OCR fails entirely (e.g. encrypted PDF) →
DocumentExtraction.status = FAILED. Theclassify-and-drafthandler refuses to run; a placeholderBookkeepingDocumentrow stays without a pairedJournalEntry. Abookkeeping.document.ocr_failedaudit event is written. - Duplicate document hash → handler dedupes by SHA-256 hash. The second upload returns the existing
BookkeepingDocumentid and writes abookkeeping.document.duplicate_rejectedaudit event. - Multi-currency — if
currency != bookkeepingConfig.baseCurrency, the entry is persisted withcurrency,exchangeRate,exchangeRateSource, andtotalAmountcalculated in base currency. FlagNON_BASE_CURRENCYis added. (Mirrors the seeded Figma entry: USD 75 @ 1.35 → SGD 101.25.) - OCR confidence below threshold for ≥ 3 fields (per SOP §7) → entry flagged
LOW_CONFIDENCE_OCR. Pipeline does not auto-classify; falls back toUNCLASSIFIED.
Next
Proceed to B-05 · Review a journal batch. To exercise the WhatsApp or folder paths, see B-04.