Data Engineering for the Agentic Era

In my earlier post “The Model Is the Platform Now“, I argued that intelligence is eating software – that models and harnesses are displacing APIs as the load-bearing layer of digital systems. I made the case briefly that data and context become the new source of durable competitive advantage.

If the harness is the new platform, what feeds the harness?

The answer is data engineering – but not the data engineering most organisations still practice. For the past decade or longer, data teams have been optimising for a world where the consumer at the end of the pipeline was a human running a known query against a known table. That world is ending.

The consumer is now a reasoning system deciding what to ask at runtime, taking decisions on real systems, and operating under adversarial conditions.

So what do we do?

Shift 1: From Feature Engineering to Context Construction

The old way: When organisations wanted to do something intelligent with their data – predict something – they hired data engineers to build features. A feature is a pre-computed signal (e.g. aggregation, ratios, frequency). Each feature took deliberate craft to design, compute correctly, version when the business changed its mind and serve when a prediction is required. Organisations built entire platforms – feature stores – to manage this work. This labour of love was preparing signals in advance so that when a model needed to make a prediction, the right numbers were available.

The new way: Agents don’t reach for pre-computed features. For example, when a customer reaches out about their bill, the agent looks up this customer’s last few invoices, this customer’s plan history, this customer’s chat history – all this decided in the moment, based on what the question seems to need. Therefore, it’s no longer the data engineer’s job to anticipate every possible signal in advance. It is to ensure that the right information can be assembled in the moment, retrieved through tools, and shaped to fit a model’s attention budget.

More importantly, the optimisation target is also inverted. In the past, data warehousing was built on a simple rule : capture everything, store it cheap, let people query it later. The implicit goal was maximum fidelity. Agents flip this. Language models have fixed attention span – and counterintuitively, giving them more information often makes them worse – “context rot”: accuracy starts dropping as you feed in more, often well before any technical limit is reached. The harness’s job is not to deliver everything available. It’s to deliver the smallest amount of high-signal information that achieves the goal.

So, the work has moved. From built-time to runtime. From human-authored features to model-authored context assembly. From maximum fidelity to minimum sufficient context.

What this means for how you build: Stop investing in feature stores as your AI substrate. Invest in retrieval – the systems that find the right slice of data at the right moment. That means semantic layers that translate raw tables into business meaning, vector indexes for unstructured content, ranking systems that decide what to send and what to leave out, and summarisation layers that compress when context is tight.

The engineering effort moves upstream into making your data legible to a reasoning system, and downstream into shaping what reaches the model’s attention. In the middle, the pre-computed signal – shrinks.

Shift 2: From Data Tests to Evals

The old way: Data engineers usually ship pipelines with tests. “No null user IDs.”, “Numbers are within a reasonable range” and “This column’s distinct count matches with the source system’s”. These tests are mechanical checks on the data itself – the values, the shapes, the freshness. If the numbers look reasonable, the pipeline passes and the dashboard refreshes. When a number was wrong, the consequences were limited to a misleading chart and an awkward meeting.

The new way: Agents don’t just show data – they act on it. The question is no longer “is this number correct”, but “did the system using this data do the right thing?” Did the agent route the approval request correctly, Did it respect the required audit processes, Did it reject an application that it shouldn’t. None of these are answerable by checking the data itself. They are questions about the behaviour of the system that consumed the data.

If you’re a bit lost, you’re not alone. Most leaders have not internalised: the eval system is a data engineering problem. Trace storage, label management, sampling strategies, judge models, regression baselines – these are data products in their own right. The team that owns the warehouse should own the eval store. The same instincts that build clean lineage should build clean trace pipelines.

What this means for how you build: Stand up the eval infrastructure as a first-class data product, before you build agents in earnest. That means logging every agent interaction with structured traces, designing a labelling workflow that makes it easy for domain experts to mark failures, building automated graders (LLM-as-a-judge) for scale, and wiring it all into continuous integration so that prompt changes can be measured before they ship. Treat your eval datasets the way you treat reporting tables. The teams that move past “demo” into “production” almost always have this loop running. The teams that don’t are stuck doing vibe checks.

Shift 3: From Access Control to Capability Architecture

The old way: When a dashboard needed to read the customer database, you requested read access from a central (security) team. They reviewed, grant permissions, logged it for audit and that was largely it. Security was an adjacent concern, owned by different group, governed by policies that assumed the consumer at the end of the access grant was a human running queries through a tool.

The new way: Agents combine 3 things that used to live in separate boxes. They read private data (e.g. business records). They take actions. And they consume external content that wasn’t authored by you (e.g. a user request). Simon Willison who coined “prompt injection” calls this the lethal trifecta because language models have no reliable way of distinguishing legitimate instructions from instructions hidden in the data they happen to read (think phishing for AI).

This isn’t theoretical, but has happened pretty often. GitHub’s MCP servers allow attackers to pull data from private repositories by posting cleverly worded issues on public ones. Similarly, Google and Microsoft have released fixes for these problems. The pattern keeps recurring because the underlying vulnerability is a property of how language models work, not a bug to be patched.

This means that deciding which data sources an agent can read, which actions it’s allowed to take, and which kinds of external content can enter its context aren’t separate security decisions made by a separate team. They are data architecture decisions made by whoever wires up the harness. The old wall between “the data team” and “the security team” doesn’t survive the agent era. Capability composition becomes part of what shipping a data product means.

What this means for how you build: Design every agent as if it will be exploited, because eventually it will be. Separate read scopes from write scopes – most agents need far less write capability than they’re given. Treat any content outside your organisation as adversarial input, regardless of how it got there. As Willison advocates: never combine all three parts of the trifecta in a single agent’s loop. The data engineer of the agentic era is also a security architect, whether the job description reflects it yet or not.

Shift 4: From Pipeline Builder to System Designer

The old way: Data engineering was a craft of hands-on labor. Writing SQL, building DAGs, debugging pipeline failures or backfilling tables when schemas changed. The job rewarded patience, attention to detail, and the willingness to grind through messy work that nobody else wanted to do. Career progression used to mean getting faster at the grind – and eventually becoming senior enough to design instead of debug.

The new way: Increasingly, the grind is done by agents – supervised by humans. Structured data extraction (turning messy documents into clean tables) is now one of the most economically valuable applications of language models. Schema inference, documentation, lineage backfill, test generation, anomaly investigation – all of it gets faster when the agent does the first pass and an engineer reviews.

This isn’t the “data engineer will be replaced” story. It’s a role shift. The work that compounds – designing systems, defining contracts, curating semantic meaning, setting policy on what’s safe to automate, deciding which judgements requires humans – gets more leverage, not less. The work that didn’t compound – writing transformation logic, manually documenting tables, fixing the same pipeline error for the tenth time – gets done by agents. I see data engineers in 2026 spending less time typing and more time architecting, reviewing and curating the small set of human judgements that actually move outcomes.

What this means for how you build: Stop hiring data engineers as pipeline labor. Hire them as system designers who happens to use agents as their implementation tool. Invest in the infrastructure that makes agents safe to use against your data – sandboxed environments, branch-and-merge semantics on data changes, rollback capability, audit trails. The teams that get the leverage will be the ones who let agents do the grind, while humans hold the design. The teams that resist will keep hiring more headcount to fight the same fires.

The Power Shifts: Who This Favours

These four shifts don’t affect every organisation equally.

Organisations with clean, well governed data gain compounding leverage. The semantic layer built when someone insisted on a single definition of “active user” will turn out to be the unlock required to put an agent on your data. The lineage discipline that was maintained when it felt like overhead will become the trust layer that makes autonomous action auditable. Organisations that treated data quality as a capability investment rather than a compliance exercise are already ahead.

Organisations with fragmented data and trial knowledge are exposed. The AI initiatives that are failing are almost never failing because of the model. They’re failing because the harness can’t be fed – the data is in X systems with Y definitions of Z statuses. The schemas aren’t documented, the access controls assume only humans will ever connect, and the institutional knowledge lives in Slack DMs, Teams and worse – human heads. The model can’t fix that, but will expose it.

Data platforms that move towards agent-native primitives – semantic layers, MCP styled tool interfaces, branch able data environments, native eval support will eat share from platforms that stayed analytics-native. Snowflake, Databricks and newer players are all racing to be agent-native.

Data engineers who reposition early become disproportionately valuable. The skill of designing agent-safe data systems – composing tools, scoping capabilities, building evaluation flywheels, curating semantic meaning – is rare and the demand curve is only going up. The skill of writing transformation SQL by hand is becoming a commodity the agents themselves can apply.

Where to Start

This shift is already happening in production at organisations across every industry. The question is not whether to engage with it – it’s how to engage effectively. Here are some principles I propose to guide the work.

Run a context audit, not just a data audit. The traditional data audit asks “what data do we have and how clean is it?” The context audit asks “for decisions we want an agent to make, can the right information be assembled at runtime?” Those are different questions. most organisations discover that the data exists but the assembly is impossible – too many systems, too many definitions, no semantic layer, no clean tool interface.

Stand up your eval infrastructure before your first agent ships, not after. You will not be able to easily retrofit it later. The teams that have working agents in production are the teams that built the trace-and-label loop on day one and let it shape everything else.

Choose your trifecta exposure deliberately. For every agent you ship, ask: what private data can it read, what actions can it take, what external content can it consume? if all three are present and unbounded, you have an exfiltration vector waiting to be exploited. Decompose. Most agents don’t need all three.

Treat your harness as a product. In my previous piece – The retrieval layer, the tool catalog, the eval system, the access policies, the prompt library – these together form the harness. They deserve product investment, product ownership and product discipline. They are about to be more strategically important than what most of your customers / user-facing products your organisation runs.

Repoint the data team’s hiring profile. Stop hiring for SQL throughput. Hire for systems thinking, security instincts, and the ability to design environments that agents will operate in. The agents themselves can supply the SQL.

In the previous piece “The Model Is the Platform Now“, I argued that the platform has shifted – that intelligence is eating software, and the new junctions are the harnesses that shape what intelligence can do.

This piece is more about what feeds those harnesses.

Earlier organisations that thrived in the warehouse era were those who recognised that data was a strategic asset and invested in the discipline to manage it. The organisations that will thrive in the agentic era are the ones who recognise that context is now the strategic asset – and that the data engineering discipline must transform from preparing data for known queries to assembling context for unknown ones.

The data pipes haven’t disappeared. They’ve changed shape. Have your team noticed?