A client's accountant was spending hours every month pulling numbers from Xero, writing variance narratives, formatting spreadsheets. The whole workflow was deterministic enough to automate — except for one part.

The BVA (budget vs actual) forecasting service connects to Xero via OAuth, pulls invoices and transactions, and uses an LLM to do two things: map supplier names to GL codes, and write plain-English explanations of why a variance exists.

The GL code problem

Accounting software is full of naming variations. "Smith & Co Ltd", "Smith and Company", "Smith Co." — all the same supplier. A human accountant recognises these. A naive string match doesn't. The agent builds a memory database of historical mappings and uses it as a reference when it sees a new entry. Over time it gets better. First month it asks for confirmation on anything unusual. By month three it's mapping correctly 90% of the time without prompting.

This is persistence that actually matters. It's not storing conversation history for the sake of it — it's building a domain-specific lookup table that compounds.

The citation requirement

Every number in the output has a Citation attached. The Citation records the type (invoice PDF, Xero transaction, board assumption, historical pattern), the supplier, the invoice ID, the confidence percentage, and the basis — "12-month average" or "board decision at March meeting."

This matters for audit. An AI-generated financial report that can't explain where the numbers came from isn't useful to an accountant. The citation trail means every variance can be traced back to a source document.

Material variances — anything over 10% or over £5,000 — get a narrative. The narrative is constrained: it can only reference data that exists in the system. No invented explanations.

Where it goes

Results write to a Google Sheets spreadsheet. An email fires when the report is ready. The whole thing runs without a human in the loop.

The hardest part wasn't the Xero integration or the LLM prompting. It was defining what "correct" meant — and building a test suite that could verify it. Financial analysis is one of those domains where being 90% right isn't good enough. The citation model was what made the remaining 10% auditable rather than just wrong.