How did Recce build its AI data review agent?

Recce iterated through four architectural stages: a single prompt with API calls, an agent architecture with Claude CLI, custom MCP tools and subagents for token limits, and finally migration to Claude Agent SDK for production infrastructure. Each stage was driven by real failures discovered through dogfooding on internal PRs of increasing complexity.

Why did a single prompt approach fail for AI data reviews?

A single prompt approach failed because cramming PR context, code diffs, lineage changes, row counts, and schema diffs into one API call exceeded prompt size limits. Information got cut off or the model could not process it properly. The LLM could only work with pre-selected data and could not explore context independently — it could not go check row counts for a model unless that data was already included.

What token limits affect AI agent architecture?

Three token limits constrain AI agent design: the model context window (200k tokens for Claude), the single prompt limit (approximately 90k tokens in practice), and the MCP tool response limit (25k tokens per tool call). A lineage diff for just 5 changed models can exceed the 25k MCP limit when returning full API payloads. These limits drove the move to subagent architecture and MCP response optimization.

How do subagents solve AI agent context limits?

Subagents solve context limits by distributing the analysis across multiple isolated 200k context windows. A main orchestrator agent delegates PR understanding to a pr-analyzer subagent and data exploration to a recce-analyzer subagent. Each specialist works with full context in its domain, then returns a summary to the main agent. This effectively triples available context capacity.

How Did Recce Build an AI Data Review Agent?

March 31, 2026 aidata-reviewdbtengineering

Why Did a Single Prompt Approach Fail?

The first approach seemed obvious: gather all context of a code change, write one prompt, see the output, iterate. Using Anthropic’s API, the team fed PR details, code diffs, lineage changes, row counts, and schema diffs into a single large prompt.

The first PR looked good. The second PR looked good. Then the edge cases arrived.

The summary format varied between runs. The output missed information randomly. The model stopped following prompt instructions carefully. The root cause: cramming everything into one API call pushed against prompt size limits. Information got cut off or the model could not process it all properly.

More critically, the LLM could only work with pre-selected data. Reviewing the output, the team would think “why didn’t it check the row count for model X?” — then realize the answer was simple: because that data was not included in the prompt. The model could not go get data on its own.

The single prompt approach works for simple PRs. It breaks down as complexity increases.

How Did the Agent Architecture Change Everything?

Moving to an agent architecture meant the LLM could call tools during its reasoning. Instead of pre-selecting all context, the agent could explore:

Check the lineage diff, notice model X changed
Follow the lineage, check downstream impacts
Run row counts for model X, see a 50% drop
Investigate why — check code diff, schema changes, write custom queries

The agent explores like a human reviewer: discover something, dig deeper, follow the trail.

The first agent architecture wrapped Recce as an MCP tool (giving the agent access to lineage diffs, row counts, query results, and schema changes), used gh CLI for PR context, and let the agent decide which tools to call and when.

Results improved immediately. Then dogfooding on more complex internal PRs revealed new failure modes.

What Token Limits Constrain AI Agent Design?

Three limits shape what an AI agent can realistically do:

Limit	Value	Impact
Model context window	200k tokens (Claude)	Total information the agent can hold
Single prompt limit	~90k tokens (practical)	Maximum per API call
MCP tool response	25k tokens per call	Maximum data returned by any single tool

The MCP tool limit hit first. A lineage diff for a PR with only 5 changed models easily exceeded 25k tokens when returning the full API payload — every node detail, all dependencies, complete diff information. The tool call failed outright.

PR context fetching was similarly expensive. The agent needed 5-10 back-and-forth gh CLI calls to gather details, code diffs, comments, and metadata. Each round-trip added latency and burned tokens.

How Do Subagents Solve the Context Problem?

With a single agent hitting both context window and prompt limits, the team needed to distribute the analysis load. Subagents — specialist agents each with their own 200k context window — provided the solution:

Main agent: Orchestrates the overall job, receives summaries (not full details) from subagents
pr-analyzer subagent: Extracts and interprets PR context, returns a summary
recce-analyzer subagent: Explores data with Recce MCP tools, returns findings

The main agent does not need full exploration details — just the summaries. By delegating deep analysis to specialists, available context capacity effectively triples. This is the same multi-agent pattern that makes the production system reliable.

How Did the Team Optimize MCP Tool Responses?

Staying under the 25k MCP tool limit required several optimizations:

Return only changed and downstream nodes. In a 200-model dbt project, most models are unchanged upstream dependencies. For a PR with 5 changed models, keep those 5 plus their downstream impacts, filter out the ~150 unchanged upstream models.

Use dataframes instead of key-value objects. Key-value entries repeat keys for every record, wasting tokens. Dataframe format reduces duplication dramatically.

Use numeric indices instead of long node IDs. Replacing model.my_project.customer_orders with compact integers (1, 2, 3) substantially reduced token counts.

Wrap PR fetching in a single GraphQL call. Instead of 5-10 gh CLI round-trips, one custom MCP tool using a single GitHub GraphQL call fetches complete PR context. This reduced latency, token usage, and the risk of the agent losing track of partially fetched information.

How Was the Lineage Graph Accuracy Problem Solved?

Generated lineage graphs were sometimes incorrect, showing wrong connections between models. The original response format followed manifest.json structure with a parent_map — a nested mapping the agent had to reason over carefully to produce correct Mermaid diagrams.

The fix was changing the tool output to match Mermaid’s native edge representation. Instead of:

{
  "parent_map": {
    "node_1": ["node_2", "node_3"]
  }
}

The response became:

{
  "edges": {
    "columns": ["from", "to"],
    "data": [
      ["node_2", "node_1"],
      ["node_3", "node_1"]
    ]
  }
}

With explicit edge pairs, the agent no longer needs to infer relationships from nested mappings. The generated diagrams became significantly more stable. This lesson — match the output format to how the consumer will use the data — applies broadly to MCP tool design.

What Happened When the Agent Read Its Own Previous Output?

An unexpected failure: the agent consumed its own previous PR comment summaries as part of the PR context. When it read all PR comments to understand the discussion, it treated old summaries as new information. This caused two problems:

It failed to follow updated prompts, thinking the old summary was the correct format
It mistook old analysis as current context

The fix was filtering out agent-generated comments by their signature before feeding PR context to the agent. A simple but non-obvious solution to a problem that only surfaces during real usage.

Architectural Evolution: A Timeline

Stage	Architecture	What Broke
1	Single prompt with API calls	Prompt size limits, no autonomous exploration
2	Agent with Claude CLI + MCP tools	Token limits, unreliable lineage, slow PR fetching
3	Subagents + optimized MCP responses	Scaling needs, multi-platform support requirements
4	Claude Agent SDK	Production infrastructure for ongoing iteration

Each stage was driven by dogfooding on real PRs of increasing complexity. What looks like a simple prompting problem turned out to be an infrastructure and observability challenge that took two months of iteration.

What Did the Team Learn?

Domain-specific AI agents require deep iteration on token limits, context management, tool design, output formats, and edge cases discovered only through real usage. The gap between a weekend prototype and a production system is wider than most teams expect.

The system now works on GitHub PRs and GitLab MRs, uses six Recce MCP tools for data validation, and understands context across PR metadata, Recce analysis, and the data warehouse. But every feature in that list represents a specific failure mode that was discovered, diagnosed, and solved through iteration on real data.

Frequently Asked Questions

How did Recce build its AI data review agent?: Recce iterated through four architectural stages: a single prompt with API calls, an agent architecture with Claude CLI, custom MCP tools and subagents for token limits, and finally migration to Claude Agent SDK for production infrastructure. Each stage was driven by real failures discovered through dogfooding on internal PRs of increasing complexity.
Why did a single prompt approach fail for AI data reviews?: A single prompt approach failed because cramming PR context, code diffs, lineage changes, row counts, and schema diffs into one API call exceeded prompt size limits. Information got cut off or the model could not process it properly. The LLM could only work with pre-selected data and could not explore context independently — it could not go check row counts for a model unless that data was already included.
What token limits affect AI agent architecture?: Three token limits constrain AI agent design: the model context window (200k tokens for Claude), the single prompt limit (approximately 90k tokens in practice), and the MCP tool response limit (25k tokens per tool call). A lineage diff for just 5 changed models can exceed the 25k MCP limit when returning full API payloads. These limits drove the move to subagent architecture and MCP response optimization.
How do subagents solve AI agent context limits?: Subagents solve context limits by distributing the analysis across multiple isolated 200k context windows. A main orchestrator agent delegates PR understanding to a pr-analyzer subagent and data exploration to a recce-analyzer subagent. Each specialist works with full context in its domain, then returns a summary to the main agent. This effectively triples available context capacity.

Read the full article: From One Prompt to AI Agents: How We Built AI Data Reviews