How to Use Claude Code for dbt Analytics Engineering
What Does It Take to Use Claude Code for dbt?
Using an LLM to generate dbt models is not a prompting problem — it is an infrastructure problem. Before Claude Code can produce useful analytics engineering output, it needs custom skills: explicit rules for naming conventions, primary key patterns, model structure, dev environment commands, and what “good” data models look like in your project.
This goes beyond writing a system prompt. A practical setup includes:
- Naming conventions and folder structure so generated models match existing project patterns
- MCP integrations — dbt MCP for project context and Recce MCP for data validation (row count diffs, schema comparison, profiling)
- Golden scenarios — examples of expected model output that Claude can reference as ground truth
The setup phase is where the actual engineering happens. The generation step is the easy part.
How Well Does Claude Code Handle End-to-End dbt Generation?
In a real-world test building an analytics warehouse from Snowflake ingestion through mart models, Claude Code handled the full pipeline: creating Snowflake tables, loading data from S3, and building sources, staging, intermediate, and mart layers following a dimensional model organized into medallion architecture layers.
The results were structurally sound. Claude Code:
- Followed project naming conventions and used CTEs consistently
- Organized files into folders matching the existing project structure
- Inferred that certain intermediate tables should be incremental based on append-only data patterns — without being told
- Created its own verification plan using data diffs — row count comparisons, profiling (min, max, average, distinct counts), and distribution histograms via the Recce MCP
The code compiled and ran. But “it ran” and “I’d ship this” are different things.
Where Does AI-Generated dbt Code Need Human Review?
Every AI-generated dbt model needs review. Common issues that appeared in practice:
| Issue | Risk | What Should Happen |
|---|---|---|
| Inner joins used where left joins are appropriate | Silently drops rows on edge cases — breaks months later | Default to left joins; require explicit justification for inner joins |
| Generic column descriptions | Inadequate for semantic layer consumption | Descriptions must reflect business context, not just technical metadata |
| Mart models referencing staging instead of intermediate tables | Bypasses incremental logic the AI itself created | Enforce layer dependencies in skills |
| Missing edge cases in row-number logic | Incorrect deduplication for multi-connection entities | Document known limitations rather than shipping silently |
These are not unusual mistakes — they mirror what a distracted human engineer might produce. The difference is that when you write a bad join yourself, you at least know you wrote it. With AI-generated code, every assumption needs explicit verification.
What Role Does MCP Play in AI-Assisted dbt Development?
MCP (Model Context Protocol) is what separates “AI writing SQL” from “AI doing analytics engineering.” Without MCP, Claude Code generates models blind — it can only reason about what the SQL should do. With MCP integrations, it can validate what the SQL actually does.
Two MCP servers make the difference:
- dbt MCP — provides project metadata, compilation context, and model dependencies
- Recce MCP — enables data review capabilities: row count diffs between dev and prod, schema comparison, column profiling, and distribution histograms
When Claude Code has access to Recce MCP, it builds verification into its own workflow. Instead of generating models and hoping they work, it runs dbt tests, compares row counts across environments, and profiles column distributions — the same checks a human data reviewer would perform.
This matters because incorporating MCPs as part of customized skills and workflows produces better outcomes than prompting alone.
Why Is the Iteration Loop the Real Value?
The one-shot generation is impressive to watch but provides modest time savings on its own. The compounding value comes from the iteration and documentation loop:
- Every bad join becomes a rule in the skills file
- Every ignored existing model becomes an enforced convention
- Every silent data quality decision becomes a guardrail
- Every edge case becomes a documented known limitation
Each run tightens the skills. The next batch of production tables produces fewer issues because the previous batch’s mistakes are now encoded as constraints. Over time, teammates who lack deep contextual knowledge of the data can produce the same quality output because the skills carry that institutional knowledge.
This is the same principle behind CI checks for dbt — systematic encoding of quality expectations that compound over time.
What Did AI Actually Save vs. What Still Requires Manual Work?
| Task | AI Contribution | Human Still Required |
|---|---|---|
| Snowflake configuration debugging | Diagnosed and fixed issues — saved an afternoon | Verify the fix is correct |
| Model scaffolding (sources, staging, marts) | Generated complete layer structure | Review every join, every reference, every description |
| Verification plan | Built row count diffs, profiling, histograms automatically | Interpret results, query in notebooks, compare to known data |
| Iteration speed | ”Build, review, catch problems, discuss” loop faster than blank files | Every decision still needs human judgment |
| Documentation | Skills file captures conventions automatically | Business context must come from domain knowledge |
The honest assessment: manual verification still has to happen regardless. The time savings were modest for the initial data ingestion. The agent might introduce mistakes you wouldn’t have made yourself. But the iteration loop — where each mistake becomes a permanent rule — creates compounding returns that grow with every subsequent run.
Summary
AI-assisted analytics engineering with Claude Code is an infrastructure problem, not a prompting problem. The real work is building custom skills, configuring MCP integrations (dbt MCP for project context, Recce MCP for data validation), and creating golden scenarios. Claude Code can handle end-to-end dbt generation — Snowflake tables through mart models — but the output requires the same review scrutiny as any engineer’s code. The compounding value comes from the iteration loop: every mistake becomes a rule, every convention becomes a constraint, and the skills get better with each run. The generation is the easy part. The decisions that matter are still yours.
Frequently Asked Questions
- Can Claude Code build dbt models end-to-end?
- Yes. Claude Code can create Snowflake tables, load data from S3, and build dbt sources, staging, intermediate, and mart models in a single run. It follows naming conventions, uses CTEs, and can infer patterns like incremental materialization from append-only data. However, the output requires human review — issues like incorrect join types, weak descriptions, and inconsistent model references appear regularly.
- What setup does Claude Code need for dbt projects?
- Effective dbt generation with Claude Code requires custom skills (naming conventions, primary key patterns, model structure), MCP integrations (dbt MCP for project context, Recce MCP for data validation), and golden scenarios showing expected model output. This setup infrastructure is where the real engineering work happens — not the prompt itself.
- What is MCP and how does it help Claude Code with dbt?
- MCP (Model Context Protocol) gives Claude Code access to external tools. The dbt MCP provides project metadata and compilation. The Recce MCP enables row count diffs, schema comparison, and profiling between dev and prod environments. Together, they let Claude Code validate its own output rather than generating code blind.
- What mistakes does Claude Code make when building dbt models?
- Common issues include using inner joins where left joins are safer (silently dropping edge-case rows), generating generic column descriptions instead of business-contextual ones, referencing staging models instead of intermediate tables, and missing edge cases in row-number logic. Each mistake becomes a rule in the skills file for the next run.