STANDARD OPERATING PROCEDURE
AI-Assisted Bookkeeping Automation System
Monthly and Annual Accounting Services
| Document Title | AI-Assisted Bookkeeping Automation SOP |
|---|---|
| Version | 1.0 (Draft for Automation Build) |
| Applicable Services | Monthly Bookkeeping | Annual Bookkeeping |
| Accounting Platforms | QuickBooks | Xero | Zoho Books | Tally |
| Intended Audience | Automation Engineers | System Architects | QA Team | Senior Accountants |
| Classification | Confidential |
1. Purpose and Scope
This SOP defines the end-to-end automated workflow for processing client accounting documents. It is written to serve as a functional specification for the automation build team and covers two phases:
Phase 1 -- Document Ingestion, OCR Extraction, and Journal Entry File Generation
Phase 2 -- Bank Statement Reconciliation and Unreconciled Items Client Portal
The SOP applies to two service engagement types:
Monthly Bookkeeping -- ongoing, document-by-document ingestion throughout each month
Annual Bookkeeping -- batch ingestion of a full year of documents, invoices, and bank statements
All processes must be designed with human-in-the-loop review gates prior to any data being posted to or uploaded into a live accounting system.
2. Client Master Configuration
Each client onboarded to the system must have a master configuration record established before any automated processing can commence. The system must read this configuration at the start of every processing cycle.
2.1 Required Configuration Fields
| Field | Data Type | Description |
|---|---|---|
| client_id | String (UUID) | Unique system identifier for the client |
| client_name | String | Legal entity name of the client |
| accounting_platform | Enum | QuickBooks | Xero | Zoho | Tally |
| service_type | Enum | monthly | annual |
| input_channel | Enum (multi) | whatsapp | email | folder (one or more) |
| input_email | String | Designated inbound email address (if applicable) |
| whatsapp_number | String | Registered WhatsApp number (if applicable) |
| folder_path | String | Cloud folder path or URL (if applicable) |
| base_currency | ISO 4217 | Functional currency of the client (e.g., SGD, USD, INR) |
| chart_of_accounts | JSON Reference | Client-specific COA mapping for classification |
| tax_codes | JSON Reference | Applicable tax codes and rates (GST, VAT, etc.) |
| reviewer_email | String | Email of assigned human reviewer/accountant |
| client_portal_email | String | Email for magic link dispatch to client |
| bank_account_details | JSON Array | List of bank accounts with account number and bank name |
**Note: **The chart_of_accounts and tax_codes fields must be populated during client onboarding and validated by the assigned accountant before the system goes live for that client.
3. Document Ingestion -- Input Channels
The system must monitor three possible input channels per client. Each channel is defined in the client master configuration. Multiple channels may be active simultaneously for a single client.
3.1 Channel A -- Designated Email Inbox
The system monitors the designated inbound email address on a continuous polling basis (recommended interval: every 5 minutes).
Upon receiving an email, the system checks: (a) the sender's email address against the client's authorised sender list, and (b) that at least one attachment is present.
Accepted attachment formats: PDF, JPG, JPEG, PNG, HEIC, TIFF.
All attachments are extracted, assigned a unique document_id, and placed in the client's processing queue with metadata: source = email, sender, timestamp, subject line, and original filename.
Emails with no attachments, or from unrecognised senders, are flagged in a separate exception queue and a notification is sent to the reviewer.
3.2 Channel B -- WhatsApp
The system monitors the designated WhatsApp Business number using the WhatsApp Business API.
Upon receiving a message from a registered client number, the system checks whether the message contains a media attachment (image, PDF, or document).
Accepted media formats: PDF, JPG, JPEG, PNG, HEIC.
The media file is extracted, assigned a document_id, and placed in the processing queue with metadata: source = whatsapp, sender_number, timestamp, and message_id.
Text-only messages are logged but do not trigger document processing. If a text message appears to contain a query or instruction, it is forwarded to the reviewer as a notification.
Messages from unregistered numbers are rejected, and an auto-reply is sent indicating that the number is not authorised.
3.3 Channel C -- Predefined Cloud Folder
The system monitors the designated folder path (e.g., Google Drive, SharePoint, Dropbox) using the respective API or webhook.
On detection of a new file, the system validates the file format against accepted types: PDF, JPG, JPEG, PNG, HEIC, TIFF, XLSX, CSV.
The file is retrieved, assigned a document_id, and placed in the processing queue with metadata: source = folder, file_path, timestamp, and uploader identity (if available from folder API).
Files already processed are tracked via a processed_files log to prevent duplicate processing. Deduplication is based on document_id hash of file content.
**Note: **For all channels, each ingested document must be archived in an immutable raw storage location before any processing begins. This serves as the audit trail.
4. Phase 1 -- Document Processing and Journal Entry Generation
Phase 1 covers the complete pipeline from raw document ingestion through to the generation of an upload-ready journal entry file for human review and posting into the target accounting platform.
4.1 Step 1 -- OCR and Data Extraction
Upon a document entering the processing queue, the following sequence is executed:
Retrieve the document from the raw archive using document_id.
If the file is an image (JPG, PNG, HEIC, TIFF), convert to PDF for standardised processing.
Run OCR on the document to extract raw text. The OCR engine must support multi-language output, at minimum: English, Chinese, Malay, and Tamil for Singapore-based clients.
Pass the OCR output to an extraction model to identify and populate the following fields:
| Extracted Field | Format | Notes |
|---|---|---|
| document_type | Enum | receipt | invoice | bill | credit_note | bank_statement | other |
| vendor_name | String | Name of supplier or merchant |
| vendor_address | String | Vendor address if present |
| vendor_tax_id | String | GST/VAT registration number of vendor if present |
| document_date | YYYY-MM-DD | Date on the document |
| document_number | String | Invoice, receipt, or reference number |
| currency | ISO 4217 | Currency of the transaction |
| amount_subtotal | Decimal | Amount before tax |
| tax_amount | Decimal | Tax charged (GST, VAT, etc.) |
| amount_total | Decimal | Total amount payable including tax |
| payment_method | String | Cash, card, bank transfer, etc. if identifiable |
| line_items | JSON Array | Description, quantity, unit price per line item if present |
| confidence_score | Float (0-1) | Overall extraction confidence from the model |
- Any field with a confidence_score below 0.80 at the field level must be flagged for human review. The document is not held; it continues through the pipeline with flagged fields highlighted in the review interface.
4.2 Step 2 -- Classification and Chart of Accounts Mapping
Using the extracted fields (particularly vendor_name, document_type, and line_items descriptions), the system queries the client's chart_of_accounts reference to propose an account classification.
Classification logic applies in the following order of priority:
Priority 1: Exact match on vendor_name in the client's vendor master (if configured).
Priority 2: Keyword match from line_items descriptions against COA account descriptions.
Priority 3: Category inference from document_type and vendor category (e.g., 'restaurant' maps to Entertainment).
Priority 4: System default account flagged as 'unclassified -- pending review.'
The proposed account code and account name are recorded in the journal entry record alongside a classification_confidence flag (High / Medium / Low / Unclassified).
If the applicable tax code can be determined from the vendor's tax registration number or from line item descriptions, it is applied automatically. Otherwise, the tax treatment is flagged for reviewer assignment.
4.3 Step 3 -- Journal Entry Construction
For each successfully extracted and classified document, the system constructs a draft journal entry record containing the following:
| Journal Entry Field | Value / Logic |
|---|---|
| journal_id | System-generated unique ID |
| client_id | From client master |
| entry_date | document_date from extraction; if absent, use ingestion date |
| reference | document_number from extraction |
| description | vendor_name + document_type + document_date |
| debit_account | Expense/asset account from COA mapping |
| credit_account | Accounts Payable (if bill/invoice) or Bank/Cash (if receipt) |
| amount | amount_subtotal |
| tax_account | Tax payable/input tax account from tax_codes reference |
| tax_amount | tax_amount from extraction |
| total_amount | amount_total |
| currency | currency from extraction |
| exchange_rate | Fetched from FX API if currency differs from base_currency; else 1.0 |
| source_document_id | Linked document_id for audit trail |
| source_channel | whatsapp | email | folder |
| flags | List of any low-confidence or missing fields |
| status | draft | flagged | approved | posted |
**Note: **Multi-currency transactions must always carry both the foreign currency amount and the base currency equivalent calculated at the extraction-time FX rate. The FX rate source must be recorded.
4.4 Step 4 -- Upload File Generation
Once all journal entries for a processing batch are constructed, the system generates a platform-specific upload file based on the client's accounting_platform configuration.
QuickBooks
Format: IIF file (.iif) or CSV via the QuickBooks Import Transactions template.
Required columns: Trans Type, Date, Account, Name, Amount, Memo, Class.
Tax amounts are mapped to the appropriate Tax Line field.
The file is structured so that each document produces one header row and one or more detail rows.
Xero
Format: CSV conforming to the Xero Manual Journal Import template.
Required columns: Date, Description, Reference, AccountCode, TaxType, NetAmount, TaxAmount, TrackingName1, TrackingOption1.
Each journal entry maps to one row per line in the Xero import file (debit and credit lines listed separately).
Zoho** Books**
Format: CSV conforming to Zoho Books Manual Journal template.
Required columns: Journal Date, Journal#, Reference#, Notes, Account, Debit, Credit, Tax Name, Tax Amount.
Zoho requires separate debit and credit rows per entry.
Tally
Format: XML (.xml) conforming to Tally XML import schema (LEDGER, VOUCHER, ALLLEDGERENTRIES structure).
Voucher type is determined by document_type: 'Purchase' for bills/invoices, 'Payment' for receipts, 'Journal' for adjustments.
Each entry produces a VOUCHER node with ALLLEDGERENTRIES child nodes for debit and credit legs.
Amount tags must use AMOUNT and DEBITAMOUNT / CREDITAMOUNT as required by Tally XML schema.
The generated file is placed in a designated output folder and a notification is dispatched to the reviewer_email with a link to the file and a summary of the batch (total entries, flagged items, total value).
4.5 Step 5 -- Human Review Gate
No generated file is posted or uploaded to any accounting system without explicit human approval. The review workflow is as follows:
The reviewer receives a notification with: batch summary, link to the journal entry review interface, and the generated upload file for download.
The review interface presents each journal entry with all extracted fields, the source document image, proposed account code, tax treatment, and any flags.
The reviewer can take the following actions per entry:
Approve -- entry is marked approved and included in the final upload file.
Edit -- reviewer modifies any field; the modified entry is re-validated and marked approved.
Reject -- entry is removed from the upload file and logged with reason.
Escalate -- entry is flagged for senior review with a comment.
Once all entries in the batch are actioned, the reviewer generates the final upload file (this re-runs file generation using only approved entries).
The reviewer manually uploads the file to the accounting platform. The system records the upload timestamp and batch ID against the client record.
**Note: **The system must not provide any direct API connection to post entries into the accounting platform at this stage. The upload must remain a manual human action to maintain control.
5. Phase 2 -- Bank Statement Reconciliation and Client Portal
Phase 2 is triggered when a client uploads a bank statement to the designated input channel. This phase matches processed journal entries against bank statement lines and surfaces unreconciled items to the client through a secure web portal.
5.1 Step 1 -- Bank Statement Ingestion and Parsing
The bank statement file (PDF or CSV/XLSX) is detected through the same input channel monitoring as document ingestion.
The system identifies the file as a bank statement via document_type = bank_statement from the extraction step, or via filename pattern matching rules defined in the client configuration.
The bank statement parser extracts the following for each transaction line:
| Field | Format | Notes |
|---|---|---|
| bank_txn_id | String | System-generated ID; or bank reference if present |
| txn_date | YYYY-MM-DD | Transaction date on the statement |
| value_date | YYYY-MM-DD | Value date if shown; else same as txn_date |
| description | String | Bank-provided transaction narration |
| debit_amount | Decimal | Outflow amount; null if credit entry |
| credit_amount | Decimal | Inflow amount; null if debit entry |
| balance | Decimal | Running balance after transaction |
| currency | ISO 4217 | Currency of the account |
| bank_account_ref | String | Matched against bank_account_details in client master |
- The system validates the opening and closing balances parsed from the statement against each other using the transaction amounts. Any arithmetic discrepancy is flagged and the reviewer is notified before reconciliation proceeds.
5.2 Step 2 -- Matching Engine
The matching engine attempts to pair each bank statement transaction against journal entries that have been approved and relate to the same client and bank account.
Matching is executed in the following priority sequence:
Exact Match: Date + Amount + Reference Number. If all three match, the entry is automatically reconciled.
Amount + Date Match (within 3-day tolerance): If amount matches exactly and dates are within a 3-calendar-day window, the entry is proposed as a match (requires reviewer confirmation).
Amount Match Only: Flagged as a possible match. Reviewer must confirm.
No Match: The bank transaction has no corresponding journal entry. Classified as unreconciled and escalated to the client portal.
**Note: **The 3-day date tolerance is configurable per client in the client master configuration and must be reviewed during onboarding.
Each matched pair is recorded in the reconciliation_log with:
bank_txn_id
journal_id
match_type (exact | proposed | manual)
matched_by (system | reviewer_id)
matched_at (timestamp)
5.3 Step 3 -- Unreconciled Items Portal (Client-Facing)
All unreconciled bank transactions are surfaced to the client via a secure, session-based web portal accessed through a magic link.
5.3.1 Magic Link Generation and Dispatch
Upon completion of the matching engine run, the system generates a unique magic link for the client.
The magic link contains a signed, time-limited token (recommended expiry: 30 days, configurable per client).
The link is dispatched to the client_portal_email address stored in the client master.
The email includes: number of unreconciled items, total value of unreconciled items, and a direct link to the portal session.
The magic link is single-use for the initial login. Subsequent re-entries within the token validity window do not require a new link -- the session resumes automatically.
5.3.2 Portal Interface Requirements
The portal must present the following for each unreconciled bank transaction:
Transaction date
Bank description / narration
Amount (debit or credit)
A response section with two options:
Option A: Upload Supporting -- file upload control for the client to attach the supporting document.
Option B: No Supporting Available -- checkbox with a mandatory text field for the client to describe the nature of the transaction (e.g., 'Petty cash reimbursement -- no receipt obtained').
A confirmation button per row to submit the response for that transaction.
5.3.3 Save and Resume Functionality
The portal must implement persistent session state as follows:
Every client response that is confirmed is immediately persisted to the backend database against the bank_txn_id.
When the client re-opens the portal via the same magic link, all previously submitted responses are pre-populated and displayed with a 'Submitted' status indicator. These rows are locked and cannot be re-edited without a reviewer override.
Rows with pending responses remain open for input.
The client can submit responses in any order and in any number of sessions. There is no requirement to complete all items in a single session.
A progress indicator is displayed (e.g., '7 of 23 items resolved') to guide the client.
5.3.4 Client Response Processing
Upon receipt of a client response:
If the client uploads a supporting document: the document is ingested into the processing queue as a new document with a source = client_portal tag and linked to the bank_txn_id. It then passes through Phase 1 processing (OCR, extraction, classification, journal entry generation) to produce a new draft entry for reviewer approval.
If the client declares no supporting: a journal entry is flagged as 'No Supporting -- Client Declared,' with the client-provided description as the memo. This entry is routed to the reviewer for account classification before inclusion in the upload file.
In both cases, the bank_txn_id is updated in the reconciliation_log with status = client_responded and timestamp.
The reviewer receives a notification whenever new client responses are received, with a summary of how many items were addressed.
5.4 Step 4 -- Reconciliation Completion and Reporting
Once all bank transactions are reconciled (whether by system match, reviewer match, or client response), the reconciliation for that bank statement is marked complete.
The system generates a Bank Reconciliation Statement in CSV or PDF format containing:
Opening balance per bank statement
Closing balance per bank statement
List of matched transactions with journal entry references
List of outstanding items (if any remain unresolved)
The reconciliation report is dispatched to the reviewer_email and archived in the client's document store.
The updated journal entries (from client-submitted documents and no-supporting declarations) are reviewed, approved, and merged into the next upload file cycle.
6. Annual Bookkeeping -- Process Variations
The annual bookkeeping engagement follows the same core Phase 1 and Phase 2 processes described above. The following variations apply:
6.1 Batch Ingestion
The client delivers all documents at once or in large batches (rather than incrementally throughout a month).
The folder channel is the primary and preferred delivery mechanism for annual engagements.
The system must support ingestion of bulk file uploads (100+ documents in a single drop) without timeout or queue overflow. Batch size limits and queue management are the responsibility of the engineering team.
The client will also provide multiple bank statements covering the full engagement period (e.g., 12 monthly statements). Each statement is processed independently through Phase 2, and reconciliation is performed month by month.
6.2 Processing Period Definition
The client master configuration for annual clients must include fields: engagement_start_date and engagement_end_date.
All document dates must fall within this window. Documents with dates outside the window are flagged as exceptions and are not automatically included in the journal entry batch.
The system must produce a separate upload file per month (or per quarter, depending on client preference as configured in the master record).
6.3 Reconciliation Portal for Annual Clients
The unreconciled items portal operates identically to the monthly process.
Given the larger volume of transactions, the portal must support filtering by month and sorting by amount and date.
The magic link for annual clients has an extended expiry of 60 days (configurable).
**Note: **Annual engagements are more likely to involve missing supporting documents. The reviewer should pre-screen the no-supporting declarations for materiality and apply appropriate accounting treatment (e.g., write-off, suspense account) before approving entries.
7. Exception Handling
| Exception | Trigger Condition | System Action |
|---|---|---|
| Unsupported file format | File type not in accepted formats | Reject; notify reviewer; archive raw file |
| Low OCR quality | Confidence score below threshold for 3+ fields | Flag entry; pass to review queue; do not auto-classify |
| Duplicate document | Hash match against processed_files log | Reject; log duplicate event; notify reviewer |
| Unrecognised sender | Sender not on authorised list for client | Reject; notify reviewer; do not process |
| Missing mandatory field | amount_total or document_date absent post-extraction | Flag entry; pass to review queue with missing fields highlighted |
| COA match failure | No account can be mapped from extraction fields | Assign to 'Unclassified -- Pending Review'; notify reviewer |
| Bank statement parse error | Opening/closing balance arithmetic fails | Halt Phase 2 for that statement; notify reviewer immediately |
| Magic link expired | Client accesses portal after token expiry | Prompt to request a new link; system regenerates and re-dispatches |
| FX rate unavailable | FX API fails for foreign currency transaction | Record entry with exchange_rate = null; flag for reviewer to input manually |
8. Data Retention and Audit Trail
All raw documents received through any channel must be archived in immutable storage at point of ingestion, before any processing occurs.
Every state transition for a document or journal entry (ingested, extracting, classified, flagged, approved, rejected, posted) must be logged with a timestamp and the identity of the actor (system or reviewer_id).
All client portal responses (supporting uploads and no-supporting declarations) are permanently stored against the bank_txn_id and client_id.
Generated upload files are retained in their final approved form alongside the batch ID, reviewer ID, and approval timestamp.
Audit logs must be immutable and exportable in CSV format on demand.
Retention period: a minimum of 7 years for all accounting records, consistent with standard statutory requirements. This is configurable per client jurisdiction.
9. System Requirements Summary for Automation Build
| Core Processing |
|---|
| Multi-channel document listener (email IMAP/API, WhatsApp Business API, cloud folder API) |
| OCR engine with multi-language support and field-level confidence scoring |
| Document classification model trained on accounting document types |
| Chart of accounts mapping engine with vendor master lookup |
| Journal entry construction engine with platform-specific output formatting |
| Deduplication engine using content hash |
| FX rate API integration for multi-currency support |
| Human Review Interface |
|---|
| Web-based reviewer dashboard with per-entry approve / edit / reject / escalate controls |
| Source document image viewer linked to each journal entry |
| Flag visualisation for low-confidence and missing fields |
| Batch summary view and one-click final file generation for approved entries only |
| Bank Reconciliation |
|---|
| Bank statement parser supporting both PDF and structured CSV/XLSX input |
| Matching engine with configurable date tolerance and match priority logic |
| Reconciliation log with full match audit trail |
| Unreconciled items export for portal ingestion |
| Client Portal |
|---|
| Magic link generation with configurable token expiry |
| Secure, session-based portal with per-transaction response controls |
| Persistent save-and-resume state (server-side; not browser-dependent) |
| File upload for supporting documents with immediate Phase 1 pipeline trigger |
| No-supporting declaration with mandatory description field |
| Progress tracking per session |
| Platform Output Formats |
|---|
| QuickBooks: IIF or CSV (QuickBooks Import Transactions template) |
| Xero: CSV (Manual Journal Import template) |
| Zoho Books: CSV (Manual Journal template) |
| Tally: XML (VOUCHER / ALLLEDGERENTRIES schema) |
10. End-to-End Process Flow Summary
| # | Step | Description | Actor |
|---|---|---|---|
| 1 | Document Received | Client sends supporting via email, WhatsApp, or folder upload | Client / System (monitor) |
| 2 | Ingestion & Archive | Document retrieved, assigned document_id, archived to raw storage | System |
| 3 | OCR & Extraction | Text extracted; fields populated; confidence scored | System (AI) |
| 4 | Classification | COA mapping applied; tax code assigned; flags set | System (AI) |
| 5 | Journal Entry Draft | Draft entry constructed with all fields and linked to source document | System |
| 6 | Upload File Generation | Platform-specific file generated for batch; reviewer notified | System |
| 7 | Human Review | Reviewer approves, edits, or rejects each entry; final file generated | Reviewer (Human) |
| 8 | Manual Upload to Platform | Reviewer uploads approved file to QuickBooks / Xero / Zoho / Tally | Reviewer (Human) |
| 9 | Bank Statement Ingested | Client uploads bank statement; system parses and validates | Client / System |
| 10 | Matching Engine Run | Journal entries matched against bank lines; unreconciled items identified | System |
| 11 | Magic Link Dispatch | Client receives portal link with list of unreconciled transactions | System |
| 12 | Client Response (Portal) | Client uploads missing supportings or declares no-supporting per transaction | Client |
| 13 | Client Document Processing | Newly uploaded documents enter Phase 1 pipeline; new entries drafted | System |
| 14 | Reviewer Action on Responses | Reviewer classifies no-supporting entries; approves all new entries | Reviewer (Human) |
| 15 | Reconciliation Completed | All items reconciled; reconciliation report generated and archived | System |
End of Document