Case study — Data Disruption

The context

Dashboards that showed everything.
Explained nothing.

The client was an Australian mid-market logistics business, roughly $180M in annual revenue. Power BI had been in production for five years. Fabric had been licensed eighteen months earlier. Dashboards were everywhere.

The problem was not the dashboards. The problem was that the finance team — six analysts across FP&A, management accounting, and commercial analytics — was spending the first two weeks of every reporting cycle writing variance commentary for the monthly board pack.

Each analyst took a portion of the business (regions, product lines, customer segments), opened the relevant Power BI report, wrote a few paragraphs explaining what moved and why, and emailed it to the Head of Finance for review. The Head of Finance then compiled, edited, re-edited, and delivered to the CFO. The CFO would ask for more detail on two or three lines. The analysts would rewrite.

Fourteen days of the month were gone before anyone had done any actual finance work.

Why "more dashboards" wasn't the answer

The instinct in most businesses at this point is to build more reports. More slicers, more drill-through, more automated variance tables. The finance team had already been down that road: they had excellent reports. They didn't need better numbers — they needed better explanations.

The second instinct is to turn on Copilot. But out-of-the-box Copilot against the existing semantic model produced exactly the kind of confident-but-occasionally-wrong output you can't put in front of a board. Measures were ambiguous. Intercompany was handled inconsistently across reports. Currency normalisation was applied in some places and not others.

You cannot get trustworthy AI commentary on an untrustworthy model. That was the honest diagnosis.

We don't need more reports. We need the system to stop making us write the same commentary every month.

What we built

A six-week Ask Your Data Workflow engagement, running through the standard six-stage Data Disruption method. Two Decision Engineers on our side, the Head of Finance and a senior analyst on theirs.

The build had three layers:

Layer one — the semantic model. We rebuilt the core finance measures with Claude Code and the Fabric MCP, resolving the intercompany, currency, and mix inconsistencies. Every new measure shipped with a unit test and documentation. The existing Power BI reports kept working; the underlying measures were now consistent.

Layer two — the business glossary. A structured definition layer the AI workflow grounds against. Forty-seven terms, co-authored with the finance team in a single workshop, kept in the client's own repository so they own it. Revenue, margin, utilisation, cost recovery, intercompany, FX-normalised — every term defined in the language the finance team actually uses.

Layer three — the workflow itself. Azure OpenAI running inside the client's own tenant, grounded on the model and the glossary. Given the monthly data refresh, it drafts first-pass variance commentary per business unit — what moved, by how much, against what baseline, with source rows cited on every claim. The output isn't pushed anywhere automatically; analysts open it, review it, edit where needed, and sign.

The 200-question eval set

The piece that made leadership willing to trust it. Before go-live, we co-authored a 200-question evaluation set with the finance team: questions the workflow must answer correctly, edge cases it must handle, and traps it must not fall into ("what's our revenue including intercompany?" — correct answer: refuse and ask for clarification).

We ran the full eval set before every release. The agreed launch threshold was 95% correct on the full set, 100% correct on the twenty-three "board-pack-critical" questions. We hit it in week five. The client now reruns the eval set monthly, and we review any failures under the Decisions Desk retainer.

What changed

Month one after launch, analysts used it cautiously — generating drafts, then rewriting most of them. Month two, they started trusting it for the straightforward variance sections and focusing their time on the harder narrative work. By month three, first-pass commentary was generated in under an hour, edited by analysts in another hour or two, reviewed and approved by the Head of Finance the same day.

Fourteen days a month of analyst time came back. Not saved in the abstract — redirected to actual commercial analysis, margin investigation, pricing work. The kind of thing finance teams always say they'd do if they had the time.

The workflow adoption rate was above 80% of the finance team by month two. The Head of Finance reported to the CFO that the board pack was both faster to produce and higher quality — because the humans were spending their energy on the judgment calls, not the description.

Six analysts.
Fourteen days a month.
Returned.

Dashboards that showed everything.
Explained nothing.

Why "more dashboards" wasn't the answer

What we built

The 200-question eval set

What changed

Show us the decision
that still takes too long.

Six analysts.Fourteen days a month.Returned.

Dashboards that showed everything.Explained nothing.

Why "more dashboards" wasn't the answer

What we built

The 200-question eval set

What changed

Show us the decisionthat still takes too long.

Six analysts.
Fourteen days a month.
Returned.

Dashboards that showed everything.
Explained nothing.

Show us the decision
that still takes too long.