Skip to content

STANDARD OPERATING PROCEDURE

AI-Assisted Bookkeeping Automation System

Monthly and Annual Accounting Services

Document TitleAI-Assisted Bookkeeping Automation SOP
Version1.0 (Draft for Automation Build)
Applicable ServicesMonthly Bookkeeping | Annual Bookkeeping
Accounting PlatformsQuickBooks | Xero | Zoho Books | Tally
Intended AudienceAutomation Engineers | System Architects | QA Team | Senior Accountants
ClassificationConfidential

1. Purpose and Scope

This SOP defines the end-to-end automated workflow for processing client accounting documents. It is written to serve as a functional specification for the automation build team and covers two phases:

  • Phase 1 -- Document Ingestion, OCR Extraction, and Journal Entry File Generation

  • Phase 2 -- Bank Statement Reconciliation and Unreconciled Items Client Portal

The SOP applies to two service engagement types:

  • Monthly Bookkeeping -- ongoing, document-by-document ingestion throughout each month

  • Annual Bookkeeping -- batch ingestion of a full year of documents, invoices, and bank statements

All processes must be designed with human-in-the-loop review gates prior to any data being posted to or uploaded into a live accounting system.

2. Client Master Configuration

Each client onboarded to the system must have a master configuration record established before any automated processing can commence. The system must read this configuration at the start of every processing cycle.

2.1 Required Configuration Fields

FieldData TypeDescription
client_idString (UUID)Unique system identifier for the client
client_nameStringLegal entity name of the client
accounting_platformEnumQuickBooks | Xero | Zoho | Tally
service_typeEnummonthly | annual
input_channelEnum (multi)whatsapp | email | folder (one or more)
input_emailStringDesignated inbound email address (if applicable)
whatsapp_numberStringRegistered WhatsApp number (if applicable)
folder_pathStringCloud folder path or URL (if applicable)
base_currencyISO 4217Functional currency of the client (e.g., SGD, USD, INR)
chart_of_accountsJSON ReferenceClient-specific COA mapping for classification
tax_codesJSON ReferenceApplicable tax codes and rates (GST, VAT, etc.)
reviewer_emailStringEmail of assigned human reviewer/accountant
client_portal_emailStringEmail for magic link dispatch to client
bank_account_detailsJSON ArrayList of bank accounts with account number and bank name

**Note: **The chart_of_accounts and tax_codes fields must be populated during client onboarding and validated by the assigned accountant before the system goes live for that client.

3. Document Ingestion -- Input Channels

The system must monitor three possible input channels per client. Each channel is defined in the client master configuration. Multiple channels may be active simultaneously for a single client.

3.1 Channel A -- Designated Email Inbox

  • The system monitors the designated inbound email address on a continuous polling basis (recommended interval: every 5 minutes).

  • Upon receiving an email, the system checks: (a) the sender's email address against the client's authorised sender list, and (b) that at least one attachment is present.

  • Accepted attachment formats: PDF, JPG, JPEG, PNG, HEIC, TIFF.

  • All attachments are extracted, assigned a unique document_id, and placed in the client's processing queue with metadata: source = email, sender, timestamp, subject line, and original filename.

  • Emails with no attachments, or from unrecognised senders, are flagged in a separate exception queue and a notification is sent to the reviewer.

3.2 Channel B -- WhatsApp

  • The system monitors the designated WhatsApp Business number using the WhatsApp Business API.

  • Upon receiving a message from a registered client number, the system checks whether the message contains a media attachment (image, PDF, or document).

  • Accepted media formats: PDF, JPG, JPEG, PNG, HEIC.

  • The media file is extracted, assigned a document_id, and placed in the processing queue with metadata: source = whatsapp, sender_number, timestamp, and message_id.

  • Text-only messages are logged but do not trigger document processing. If a text message appears to contain a query or instruction, it is forwarded to the reviewer as a notification.

  • Messages from unregistered numbers are rejected, and an auto-reply is sent indicating that the number is not authorised.

3.3 Channel C -- Predefined Cloud Folder

  • The system monitors the designated folder path (e.g., Google Drive, SharePoint, Dropbox) using the respective API or webhook.

  • On detection of a new file, the system validates the file format against accepted types: PDF, JPG, JPEG, PNG, HEIC, TIFF, XLSX, CSV.

  • The file is retrieved, assigned a document_id, and placed in the processing queue with metadata: source = folder, file_path, timestamp, and uploader identity (if available from folder API).

  • Files already processed are tracked via a processed_files log to prevent duplicate processing. Deduplication is based on document_id hash of file content.

**Note: **For all channels, each ingested document must be archived in an immutable raw storage location before any processing begins. This serves as the audit trail.

4. Phase 1 -- Document Processing and Journal Entry Generation

Phase 1 covers the complete pipeline from raw document ingestion through to the generation of an upload-ready journal entry file for human review and posting into the target accounting platform.

4.1 Step 1 -- OCR and Data Extraction

Upon a document entering the processing queue, the following sequence is executed:

  • Retrieve the document from the raw archive using document_id.

  • If the file is an image (JPG, PNG, HEIC, TIFF), convert to PDF for standardised processing.

  • Run OCR on the document to extract raw text. The OCR engine must support multi-language output, at minimum: English, Chinese, Malay, and Tamil for Singapore-based clients.

  • Pass the OCR output to an extraction model to identify and populate the following fields:

Extracted FieldFormatNotes
document_typeEnumreceipt | invoice | bill | credit_note | bank_statement | other
vendor_nameStringName of supplier or merchant
vendor_addressStringVendor address if present
vendor_tax_idStringGST/VAT registration number of vendor if present
document_dateYYYY-MM-DDDate on the document
document_numberStringInvoice, receipt, or reference number
currencyISO 4217Currency of the transaction
amount_subtotalDecimalAmount before tax
tax_amountDecimalTax charged (GST, VAT, etc.)
amount_totalDecimalTotal amount payable including tax
payment_methodStringCash, card, bank transfer, etc. if identifiable
line_itemsJSON ArrayDescription, quantity, unit price per line item if present
confidence_scoreFloat (0-1)Overall extraction confidence from the model
  • Any field with a confidence_score below 0.80 at the field level must be flagged for human review. The document is not held; it continues through the pipeline with flagged fields highlighted in the review interface.

4.2 Step 2 -- Classification and Chart of Accounts Mapping

  • Using the extracted fields (particularly vendor_name, document_type, and line_items descriptions), the system queries the client's chart_of_accounts reference to propose an account classification.

  • Classification logic applies in the following order of priority:

  • Priority 1: Exact match on vendor_name in the client's vendor master (if configured).

  • Priority 2: Keyword match from line_items descriptions against COA account descriptions.

  • Priority 3: Category inference from document_type and vendor category (e.g., 'restaurant' maps to Entertainment).

  • Priority 4: System default account flagged as 'unclassified -- pending review.'

  • The proposed account code and account name are recorded in the journal entry record alongside a classification_confidence flag (High / Medium / Low / Unclassified).

  • If the applicable tax code can be determined from the vendor's tax registration number or from line item descriptions, it is applied automatically. Otherwise, the tax treatment is flagged for reviewer assignment.

4.3 Step 3 -- Journal Entry Construction

For each successfully extracted and classified document, the system constructs a draft journal entry record containing the following:

Journal Entry FieldValue / Logic
journal_idSystem-generated unique ID
client_idFrom client master
entry_datedocument_date from extraction; if absent, use ingestion date
referencedocument_number from extraction
descriptionvendor_name + document_type + document_date
debit_accountExpense/asset account from COA mapping
credit_accountAccounts Payable (if bill/invoice) or Bank/Cash (if receipt)
amountamount_subtotal
tax_accountTax payable/input tax account from tax_codes reference
tax_amounttax_amount from extraction
total_amountamount_total
currencycurrency from extraction
exchange_rateFetched from FX API if currency differs from base_currency; else 1.0
source_document_idLinked document_id for audit trail
source_channelwhatsapp | email | folder
flagsList of any low-confidence or missing fields
statusdraft | flagged | approved | posted

**Note: **Multi-currency transactions must always carry both the foreign currency amount and the base currency equivalent calculated at the extraction-time FX rate. The FX rate source must be recorded.

4.4 Step 4 -- Upload File Generation

Once all journal entries for a processing batch are constructed, the system generates a platform-specific upload file based on the client's accounting_platform configuration.

QuickBooks

  • Format: IIF file (.iif) or CSV via the QuickBooks Import Transactions template.

  • Required columns: Trans Type, Date, Account, Name, Amount, Memo, Class.

  • Tax amounts are mapped to the appropriate Tax Line field.

  • The file is structured so that each document produces one header row and one or more detail rows.

Xero

  • Format: CSV conforming to the Xero Manual Journal Import template.

  • Required columns: Date, Description, Reference, AccountCode, TaxType, NetAmount, TaxAmount, TrackingName1, TrackingOption1.

  • Each journal entry maps to one row per line in the Xero import file (debit and credit lines listed separately).

Zoho** Books**

  • Format: CSV conforming to Zoho Books Manual Journal template.

  • Required columns: Journal Date, Journal#, Reference#, Notes, Account, Debit, Credit, Tax Name, Tax Amount.

  • Zoho requires separate debit and credit rows per entry.

Tally

  • Format: XML (.xml) conforming to Tally XML import schema (LEDGER, VOUCHER, ALLLEDGERENTRIES structure).

  • Voucher type is determined by document_type: 'Purchase' for bills/invoices, 'Payment' for receipts, 'Journal' for adjustments.

  • Each entry produces a VOUCHER node with ALLLEDGERENTRIES child nodes for debit and credit legs.

  • Amount tags must use AMOUNT and DEBITAMOUNT / CREDITAMOUNT as required by Tally XML schema.

The generated file is placed in a designated output folder and a notification is dispatched to the reviewer_email with a link to the file and a summary of the batch (total entries, flagged items, total value).

4.5 Step 5 -- Human Review Gate

No generated file is posted or uploaded to any accounting system without explicit human approval. The review workflow is as follows:

  • The reviewer receives a notification with: batch summary, link to the journal entry review interface, and the generated upload file for download.

  • The review interface presents each journal entry with all extracted fields, the source document image, proposed account code, tax treatment, and any flags.

  • The reviewer can take the following actions per entry:

  • Approve -- entry is marked approved and included in the final upload file.

  • Edit -- reviewer modifies any field; the modified entry is re-validated and marked approved.

  • Reject -- entry is removed from the upload file and logged with reason.

  • Escalate -- entry is flagged for senior review with a comment.

  • Once all entries in the batch are actioned, the reviewer generates the final upload file (this re-runs file generation using only approved entries).

  • The reviewer manually uploads the file to the accounting platform. The system records the upload timestamp and batch ID against the client record.

**Note: **The system must not provide any direct API connection to post entries into the accounting platform at this stage. The upload must remain a manual human action to maintain control.

5. Phase 2 -- Bank Statement Reconciliation and Client Portal

Phase 2 is triggered when a client uploads a bank statement to the designated input channel. This phase matches processed journal entries against bank statement lines and surfaces unreconciled items to the client through a secure web portal.

5.1 Step 1 -- Bank Statement Ingestion and Parsing

  • The bank statement file (PDF or CSV/XLSX) is detected through the same input channel monitoring as document ingestion.

  • The system identifies the file as a bank statement via document_type = bank_statement from the extraction step, or via filename pattern matching rules defined in the client configuration.

  • The bank statement parser extracts the following for each transaction line:

FieldFormatNotes
bank_txn_idStringSystem-generated ID; or bank reference if present
txn_dateYYYY-MM-DDTransaction date on the statement
value_dateYYYY-MM-DDValue date if shown; else same as txn_date
descriptionStringBank-provided transaction narration
debit_amountDecimalOutflow amount; null if credit entry
credit_amountDecimalInflow amount; null if debit entry
balanceDecimalRunning balance after transaction
currencyISO 4217Currency of the account
bank_account_refStringMatched against bank_account_details in client master
  • The system validates the opening and closing balances parsed from the statement against each other using the transaction amounts. Any arithmetic discrepancy is flagged and the reviewer is notified before reconciliation proceeds.

5.2 Step 2 -- Matching Engine

The matching engine attempts to pair each bank statement transaction against journal entries that have been approved and relate to the same client and bank account.

Matching is executed in the following priority sequence:

  • Exact Match: Date + Amount + Reference Number. If all three match, the entry is automatically reconciled.

  • Amount + Date Match (within 3-day tolerance): If amount matches exactly and dates are within a 3-calendar-day window, the entry is proposed as a match (requires reviewer confirmation).

  • Amount Match Only: Flagged as a possible match. Reviewer must confirm.

  • No Match: The bank transaction has no corresponding journal entry. Classified as unreconciled and escalated to the client portal.

**Note: **The 3-day date tolerance is configurable per client in the client master configuration and must be reviewed during onboarding.

Each matched pair is recorded in the reconciliation_log with:

  • bank_txn_id

  • journal_id

  • match_type (exact | proposed | manual)

  • matched_by (system | reviewer_id)

  • matched_at (timestamp)

5.3 Step 3 -- Unreconciled Items Portal (Client-Facing)

All unreconciled bank transactions are surfaced to the client via a secure, session-based web portal accessed through a magic link.

5.3.1 Magic Link Generation and Dispatch

  • Upon completion of the matching engine run, the system generates a unique magic link for the client.

  • The magic link contains a signed, time-limited token (recommended expiry: 30 days, configurable per client).

  • The link is dispatched to the client_portal_email address stored in the client master.

  • The email includes: number of unreconciled items, total value of unreconciled items, and a direct link to the portal session.

  • The magic link is single-use for the initial login. Subsequent re-entries within the token validity window do not require a new link -- the session resumes automatically.

5.3.2 Portal Interface Requirements

The portal must present the following for each unreconciled bank transaction:

  • Transaction date

  • Bank description / narration

  • Amount (debit or credit)

  • A response section with two options:

  • Option A: Upload Supporting -- file upload control for the client to attach the supporting document.

  • Option B: No Supporting Available -- checkbox with a mandatory text field for the client to describe the nature of the transaction (e.g., 'Petty cash reimbursement -- no receipt obtained').

  • A confirmation button per row to submit the response for that transaction.

5.3.3 Save and Resume Functionality

The portal must implement persistent session state as follows:

  • Every client response that is confirmed is immediately persisted to the backend database against the bank_txn_id.

  • When the client re-opens the portal via the same magic link, all previously submitted responses are pre-populated and displayed with a 'Submitted' status indicator. These rows are locked and cannot be re-edited without a reviewer override.

  • Rows with pending responses remain open for input.

  • The client can submit responses in any order and in any number of sessions. There is no requirement to complete all items in a single session.

  • A progress indicator is displayed (e.g., '7 of 23 items resolved') to guide the client.

5.3.4 Client Response Processing

Upon receipt of a client response:

  • If the client uploads a supporting document: the document is ingested into the processing queue as a new document with a source = client_portal tag and linked to the bank_txn_id. It then passes through Phase 1 processing (OCR, extraction, classification, journal entry generation) to produce a new draft entry for reviewer approval.

  • If the client declares no supporting: a journal entry is flagged as 'No Supporting -- Client Declared,' with the client-provided description as the memo. This entry is routed to the reviewer for account classification before inclusion in the upload file.

  • In both cases, the bank_txn_id is updated in the reconciliation_log with status = client_responded and timestamp.

  • The reviewer receives a notification whenever new client responses are received, with a summary of how many items were addressed.

5.4 Step 4 -- Reconciliation Completion and Reporting

  • Once all bank transactions are reconciled (whether by system match, reviewer match, or client response), the reconciliation for that bank statement is marked complete.

  • The system generates a Bank Reconciliation Statement in CSV or PDF format containing:

  • Opening balance per bank statement

  • Closing balance per bank statement

  • List of matched transactions with journal entry references

  • List of outstanding items (if any remain unresolved)

  • The reconciliation report is dispatched to the reviewer_email and archived in the client's document store.

  • The updated journal entries (from client-submitted documents and no-supporting declarations) are reviewed, approved, and merged into the next upload file cycle.

6. Annual Bookkeeping -- Process Variations

The annual bookkeeping engagement follows the same core Phase 1 and Phase 2 processes described above. The following variations apply:

6.1 Batch Ingestion

  • The client delivers all documents at once or in large batches (rather than incrementally throughout a month).

  • The folder channel is the primary and preferred delivery mechanism for annual engagements.

  • The system must support ingestion of bulk file uploads (100+ documents in a single drop) without timeout or queue overflow. Batch size limits and queue management are the responsibility of the engineering team.

  • The client will also provide multiple bank statements covering the full engagement period (e.g., 12 monthly statements). Each statement is processed independently through Phase 2, and reconciliation is performed month by month.

6.2 Processing Period Definition

  • The client master configuration for annual clients must include fields: engagement_start_date and engagement_end_date.

  • All document dates must fall within this window. Documents with dates outside the window are flagged as exceptions and are not automatically included in the journal entry batch.

  • The system must produce a separate upload file per month (or per quarter, depending on client preference as configured in the master record).

6.3 Reconciliation Portal for Annual Clients

  • The unreconciled items portal operates identically to the monthly process.

  • Given the larger volume of transactions, the portal must support filtering by month and sorting by amount and date.

  • The magic link for annual clients has an extended expiry of 60 days (configurable).

**Note: **Annual engagements are more likely to involve missing supporting documents. The reviewer should pre-screen the no-supporting declarations for materiality and apply appropriate accounting treatment (e.g., write-off, suspense account) before approving entries.

7. Exception Handling

ExceptionTrigger ConditionSystem Action
Unsupported file formatFile type not in accepted formatsReject; notify reviewer; archive raw file
Low OCR qualityConfidence score below threshold for 3+ fieldsFlag entry; pass to review queue; do not auto-classify
Duplicate documentHash match against processed_files logReject; log duplicate event; notify reviewer
Unrecognised senderSender not on authorised list for clientReject; notify reviewer; do not process
Missing mandatory fieldamount_total or document_date absent post-extractionFlag entry; pass to review queue with missing fields highlighted
COA match failureNo account can be mapped from extraction fieldsAssign to 'Unclassified -- Pending Review'; notify reviewer
Bank statement parse errorOpening/closing balance arithmetic failsHalt Phase 2 for that statement; notify reviewer immediately
Magic link expiredClient accesses portal after token expiryPrompt to request a new link; system regenerates and re-dispatches
FX rate unavailableFX API fails for foreign currency transactionRecord entry with exchange_rate = null; flag for reviewer to input manually

8. Data Retention and Audit Trail

  • All raw documents received through any channel must be archived in immutable storage at point of ingestion, before any processing occurs.

  • Every state transition for a document or journal entry (ingested, extracting, classified, flagged, approved, rejected, posted) must be logged with a timestamp and the identity of the actor (system or reviewer_id).

  • All client portal responses (supporting uploads and no-supporting declarations) are permanently stored against the bank_txn_id and client_id.

  • Generated upload files are retained in their final approved form alongside the batch ID, reviewer ID, and approval timestamp.

  • Audit logs must be immutable and exportable in CSV format on demand.

  • Retention period: a minimum of 7 years for all accounting records, consistent with standard statutory requirements. This is configurable per client jurisdiction.

9. System Requirements Summary for Automation Build

Core Processing
Multi-channel document listener (email IMAP/API, WhatsApp Business API, cloud folder API)
OCR engine with multi-language support and field-level confidence scoring
Document classification model trained on accounting document types
Chart of accounts mapping engine with vendor master lookup
Journal entry construction engine with platform-specific output formatting
Deduplication engine using content hash
FX rate API integration for multi-currency support
Human Review Interface
Web-based reviewer dashboard with per-entry approve / edit / reject / escalate controls
Source document image viewer linked to each journal entry
Flag visualisation for low-confidence and missing fields
Batch summary view and one-click final file generation for approved entries only
Bank Reconciliation
Bank statement parser supporting both PDF and structured CSV/XLSX input
Matching engine with configurable date tolerance and match priority logic
Reconciliation log with full match audit trail
Unreconciled items export for portal ingestion
Client Portal
Magic link generation with configurable token expiry
Secure, session-based portal with per-transaction response controls
Persistent save-and-resume state (server-side; not browser-dependent)
File upload for supporting documents with immediate Phase 1 pipeline trigger
No-supporting declaration with mandatory description field
Progress tracking per session
Platform Output Formats
QuickBooks: IIF or CSV (QuickBooks Import Transactions template)
Xero: CSV (Manual Journal Import template)
Zoho Books: CSV (Manual Journal template)
Tally: XML (VOUCHER / ALLLEDGERENTRIES schema)

10. End-to-End Process Flow Summary

#StepDescriptionActor
1Document ReceivedClient sends supporting via email, WhatsApp, or folder uploadClient / System (monitor)
2Ingestion & ArchiveDocument retrieved, assigned document_id, archived to raw storageSystem
3OCR & ExtractionText extracted; fields populated; confidence scoredSystem (AI)
4ClassificationCOA mapping applied; tax code assigned; flags setSystem (AI)
5Journal Entry DraftDraft entry constructed with all fields and linked to source documentSystem
6Upload File GenerationPlatform-specific file generated for batch; reviewer notifiedSystem
7Human ReviewReviewer approves, edits, or rejects each entry; final file generatedReviewer (Human)
8Manual Upload to PlatformReviewer uploads approved file to QuickBooks / Xero / Zoho / TallyReviewer (Human)
9Bank Statement IngestedClient uploads bank statement; system parses and validatesClient / System
10Matching Engine RunJournal entries matched against bank lines; unreconciled items identifiedSystem
11Magic Link DispatchClient receives portal link with list of unreconciled transactionsSystem
12Client Response (Portal)Client uploads missing supportings or declares no-supporting per transactionClient
13Client Document ProcessingNewly uploaded documents enter Phase 1 pipeline; new entries draftedSystem
14Reviewer Action on ResponsesReviewer classifies no-supporting entries; approves all new entriesReviewer (Human)
15Reconciliation CompletedAll items reconciled; reconciliation report generated and archivedSystem

End of Document

Internal use only — BreezyCorp