B-04 · Ingest a document — WhatsApp / folder channels
SOP:
SOP_AI_Bookkeeping_Automation.md§3.2 (WhatsApp) + §3.3 (Folder)Actors: Client (sender) — system handles the rest. Pre-state: Channel configured per B-02 andenabled = true. Bookkeeping pipeline configured per B-01 / B-03. Post-state: Same as B-03:BookkeepingDocument+ draftJournalEntry. Difference issourceChannel = WHATSAPP | FOLDER(instead ofCLIENT_PORTAL).
0. Prerequisites
- Client has at least one enabled non-portal channel.
- For WhatsApp: the channel's
phoneNumberIdmatches the inbound webhook payload. - For folder: the worker's
folder-syncjob is running and credentials reach the cloud folder.
1. WhatsApp path
1.1 Webhook receiver
Inbound media webhooks land at POST /hooks/whatsapp. The handler:
- Validates the inbound signature (Meta
X-Hub-Signature). - Resolves the
ClientIngestionChannelbyphoneNumberId. - If the channel is disabled or unknown → 200 (so Meta does not retry) but writes a
whatsapp.unmatched_channelaudit event. - If the message is text-only, surfaces it to the reviewer's inbox (no doc ingestion).
- If the message has media:
- Downloads the media from the Meta CDN.
- Stores in S3 at
<clientCode>/bookkeeping/inbox/whatsapp/<messageId>.<ext>. - Creates a
Filerow withfileKind = BOOKKEEPING_DOCUMENT,sha256computed. - Enqueues
bookkeeping.ingest-documentwithsourceChannel = 'WHATSAPP'.
1.2 Trigger via curl (local test)
curl -X POST http://localhost:3001/hooks/whatsapp \
-H 'Content-Type: application/json' \
-d '{
"entry": [{
"changes": [{
"field": "messages",
"value": {
"metadata": { "phone_number_id": "1234567890" },
"messages": [{
"id": "wamid.HBgMNjU5MTIzNDU2NwIYEDA1RkE5N0NBNzZBNkJC",
"from": "6591234567",
"type": "image",
"image": {
"id": "<media-id>",
"mime_type": "image/jpeg",
"sha256": "<sha256>"
}
}]
}
}]
}]
}'The mock-Meta-media adapter returns a placeholder PDF for any media-id. The pipeline then proceeds exactly like B-03.
1.3 Verify
SELECT source_channel, ingested_at FROM bookkeeping_documents
WHERE client_id = '<zenithId>' AND source_channel = 'WHATSAPP'
ORDER BY ingested_at DESC LIMIT 1;2. Folder path (Google Drive / SharePoint / Dropbox)
2.1 Trigger
The bookkeeping.folder-sync worker job runs on a schedule (default 5 min). For ad-hoc trigger:
pnpm --filter @breezycorp/worker exec ts-node ./src/dev/run-handler.ts bookkeeping.folder-syncThe handler iterates each enabled cloud-folder channel and:
- Resolves the credential from
credentialsRef(mock or real secret manager). - Fetches new files since
lastCursorvia the platform API:- Google Drive:
files.list(q="'<folderId>' in parents and modifiedTime > '<lastCursor>'"). - SharePoint:
/drives/<id>/root/delta?token=<lastCursor>. - Dropbox:
files/list_folder/continuewith the cursor.
- Google Drive:
- For each new file, downloads it, computes SHA-256, stores at
<clientCode>/bookkeeping/inbox/<channel>/<filename>, creates theFilerow, and enqueuesbookkeeping.ingest-documentwithsourceChannel = 'FOLDER'. - Updates
lastCursorandlastPolledAt. - On error, sets
lastErrorand re-throws so pg-boss retries with backoff.
2.2 Verify
SELECT label, last_polled_at, last_cursor, last_error
FROM client_ingestion_channels WHERE id = '<channelId>';Worker logs:
bookkeeping.folder-sync.poll channelId=<id> filesFound=3
bookkeeping.ingest-document sourceChannel=FOLDER fileId=<id>3. Negative & edge cases
- Inbound from unauthorized phone (WhatsApp) — current implementation routes by
phoneNumberId, not by sender phone. To enforce sender allow-list, implement the check inapps/api/src/routes/hooks/whatsapp.ts. Today the system writes anwhatsapp.unauthorized_senderaudit event when the channel'sconfigJson.allowedSenders(optional) excludes the from number. - Folder credential expired → handler logs the OAuth error, stamps
lastError, retries on the next tick (no infinite immediate retry). - Folder file size > 25 MB — current limit is honoured by the worker; oversized files are skipped with a
bookkeeping.folder.file_too_largeaudit event. - Same file ingested through multiple channels — dedupe is by SHA-256. The second
BookkeepingDocumentrow is not created; the audit event records the duplicate channel.
Next
After ingestion, every channel converges on the same review surface — proceed to B-05 · Review a journal batch.