Skip to content

B-04 · Ingest a document — WhatsApp / folder channels

SOP: SOP_AI_Bookkeeping_Automation.md §3.2 (WhatsApp) + §3.3 (Folder)Actors: Client (sender) — system handles the rest. Pre-state: Channel configured per B-02 and enabled = true. Bookkeeping pipeline configured per B-01 / B-03. Post-state: Same as B-03: BookkeepingDocument + draft JournalEntry. Difference is sourceChannel = WHATSAPP | FOLDER (instead of CLIENT_PORTAL).

0. Prerequisites

  • Client has at least one enabled non-portal channel.
  • For WhatsApp: the channel's phoneNumberId matches the inbound webhook payload.
  • For folder: the worker's folder-sync job is running and credentials reach the cloud folder.

1. WhatsApp path

1.1 Webhook receiver

Inbound media webhooks land at POST /hooks/whatsapp. The handler:

  1. Validates the inbound signature (Meta X-Hub-Signature).
  2. Resolves the ClientIngestionChannel by phoneNumberId.
  3. If the channel is disabled or unknown → 200 (so Meta does not retry) but writes a whatsapp.unmatched_channel audit event.
  4. If the message is text-only, surfaces it to the reviewer's inbox (no doc ingestion).
  5. If the message has media:
    • Downloads the media from the Meta CDN.
    • Stores in S3 at <clientCode>/bookkeeping/inbox/whatsapp/<messageId>.<ext>.
    • Creates a File row with fileKind = BOOKKEEPING_DOCUMENT, sha256 computed.
    • Enqueues bookkeeping.ingest-document with sourceChannel = 'WHATSAPP'.

1.2 Trigger via curl (local test)

bash
curl -X POST http://localhost:3001/hooks/whatsapp \
  -H 'Content-Type: application/json' \
  -d '{
        "entry": [{
          "changes": [{
            "field": "messages",
            "value": {
              "metadata": { "phone_number_id": "1234567890" },
              "messages": [{
                "id": "wamid.HBgMNjU5MTIzNDU2NwIYEDA1RkE5N0NBNzZBNkJC",
                "from": "6591234567",
                "type": "image",
                "image": {
                  "id": "<media-id>",
                  "mime_type": "image/jpeg",
                  "sha256": "<sha256>"
                }
              }]
            }
          }]
        }]
      }'

The mock-Meta-media adapter returns a placeholder PDF for any media-id. The pipeline then proceeds exactly like B-03.

1.3 Verify

sql
SELECT source_channel, ingested_at FROM bookkeeping_documents
  WHERE client_id = '<zenithId>' AND source_channel = 'WHATSAPP'
  ORDER BY ingested_at DESC LIMIT 1;

2. Folder path (Google Drive / SharePoint / Dropbox)

2.1 Trigger

The bookkeeping.folder-sync worker job runs on a schedule (default 5 min). For ad-hoc trigger:

bash
pnpm --filter @breezycorp/worker exec ts-node ./src/dev/run-handler.ts bookkeeping.folder-sync

The handler iterates each enabled cloud-folder channel and:

  1. Resolves the credential from credentialsRef (mock or real secret manager).
  2. Fetches new files since lastCursor via the platform API:
    • Google Drive: files.list(q="'<folderId>' in parents and modifiedTime > '<lastCursor>'").
    • SharePoint: /drives/<id>/root/delta?token=<lastCursor>.
    • Dropbox: files/list_folder/continue with the cursor.
  3. For each new file, downloads it, computes SHA-256, stores at <clientCode>/bookkeeping/inbox/<channel>/<filename>, creates the File row, and enqueues bookkeeping.ingest-document with sourceChannel = 'FOLDER'.
  4. Updates lastCursor and lastPolledAt.
  5. On error, sets lastError and re-throws so pg-boss retries with backoff.

2.2 Verify

sql
SELECT label, last_polled_at, last_cursor, last_error
  FROM client_ingestion_channels WHERE id = '<channelId>';

Worker logs:

bookkeeping.folder-sync.poll  channelId=<id> filesFound=3
bookkeeping.ingest-document   sourceChannel=FOLDER fileId=<id>

3. Negative & edge cases

  • Inbound from unauthorized phone (WhatsApp) — current implementation routes by phoneNumberId, not by sender phone. To enforce sender allow-list, implement the check in apps/api/src/routes/hooks/whatsapp.ts. Today the system writes an whatsapp.unauthorized_sender audit event when the channel's configJson.allowedSenders (optional) excludes the from number.
  • Folder credential expired → handler logs the OAuth error, stamps lastError, retries on the next tick (no infinite immediate retry).
  • Folder file size > 25 MB — current limit is honoured by the worker; oversized files are skipped with a bookkeeping.folder.file_too_large audit event.
  • Same file ingested through multiple channels — dedupe is by SHA-256. The second BookkeepingDocument row is not created; the audit event records the duplicate channel.

Next

After ingestion, every channel converges on the same review surface — proceed to B-05 · Review a journal batch.

Internal use only — BreezyCorp