How Nodal builds a test suite from your BI dashboards, dbt project, and business-context repo — then runs headless agents in parallel against ground truth, surfacing pass, fail, drift, cost, and root cause.
Eval tools score models on public benchmarks in isolation. But production AI analytics depends on the agent, your dbt project, your warehouse, and your business context working together — the same way users hit it on Monday morning. Most teams know this matters, but it never gets done.
The unlock isn't a better model. It's a combination of sophisticated eval suite and observability.
An eval suite built from artifacts your team already maintains, re-running on every change.
Since your last dbt update, accuracy on customer segmentation questions dropped from 87% → 61%.
The 3 questions most affected all involve the "enterprise" account definition.
dim_accounts.account_tier definition changed in commit a3f8c2d (April 7, 2026)
Update the "enterprise" entity definition in business-context to match the new account_tier values.
Non-technical users don't ask fully-specified questions. Nodal makes the gaps visible before SQL runs.
What's our retention?
What's the [30-day] retention rate [for all users] [in the last 90 days] [compared with the previous 90-day period]?
Defaults pulled from your documentation. Change any [bracket] before I run it.
Should I run this, or would you like to change any of the defaults?
Change to enterprise users only. Run it.
The eval suite is also a measurement tool — for the docs you maintain and the models you pay for.
Before evals matter, four pieces have to be in place. The video walks through each — and why the one most teams skip is the one that decides whether everything else works.