AI agent threat modelling: how to map attack surfaces before enterprise procurement asks

In-depth analyses of real-world cyber incidents and emerging threat trends, authored exclusively by our analysts.

Joanna Larson

•

8 min read

•

29 June 2026

Most AI startups discover their attack surface the hard way, when an enterprise buyer's security team maps it for them and finds something alarming. There is a better way, and it is the thing serious security engineers do as a matter of course but almost nobody has written down for an AI startup audience. It is called threat modelling, and doing it for your AI agent before procurement does it for you is one of the highest leverage security activities available to you. This guide explains how to threat model an AI agent specifically, using a method you can actually apply.

Threat modelling is not exotic. At its core it is four questions. What are you building, what can go wrong, what are you going to do about it, and did you do a good enough job. The reason it matters here is that AI agents introduce categories of what can go wrong that traditional threat modelling never had to consider, and that is exactly where startups are exposed and where this method earns its place.

What threat modelling actually is

Threat modelling is the practice of systematically thinking through how your system could be attacked, before it is, so you can design defences deliberately rather than react after an incident. You build a picture of your system, identify where it could be attacked and what an attacker would want, work out which of those threats matter most, and decide what to do about each.

For an AI agent, the value is that it forces you to look at your product the way a buyer's security team and a real attacker will. By the time you have done it properly, you can answer the hard procurement questions with specifics rather than reassurance, because you have already found and addressed the things they would otherwise find for you.

Step one, map what you are building

You cannot model threats against a system you have not mapped. The first step is to draw your agent's actual architecture, with particular attention to the things unique to AI products. For an AI agent, the map should capture the following.

The entry points. Everywhere input enters, including user input, retrieved documents, tool outputs, and any external content the agent processes.
The model layer. Which model you call, what data is sent to it, and what comes back.
The tools. Every tool the agent can invoke, and crucially what each one is permitted to do, read, write, call, execute.
The data. What data the agent can reach, where it is stored, and how customers are separated.
The trust boundaries. The points where data crosses from less trusted to more trusted, for example from external content into a tool invocation, or from one customer's context toward another's.

That last point, the trust boundaries, is where most of the interesting threats live, because attacks happen when untrusted input crosses into a place that assumes it is safe.

Step two, identify what can go wrong, the AI-specific threats

This is where AI threat modelling diverges from the traditional kind. Alongside the normal application risks, an AI agent has a distinctive set of threats you must walk through deliberately. The useful discipline is to go through your map and, at each point, ask what an attacker could do.

Prompt injection, direct and indirect. At every entry point, including retrieved content and tool outputs, ask whether crafted text could manipulate the agent into acting against its purpose. Indirect injection through data the agent trusts is the one teams miss.
Excessive agency. For every tool, ask what damage a manipulated agent could do with it. A tool with more permission than its task needs is a threat waiting to be triggered.
Data leakage to the model. Ask what personal or sensitive data flows to the provider, and whether it should.
Cross-tenant leakage. Ask whether one customer's data could reach another, through the model, the data layer, or shared memory.
Goal misalignment. Ask whether the agent, pursuing a legitimate goal, could take a harmful path to it, such as a destructive action that technically achieves the objective.
Memory and context poisoning. For agents with memory, ask whether malicious content could persist and influence later actions.

Going through these at each point on your map turns a vague worry about AI security into a concrete list of specific, located threats, which is exactly what you can then act on.

Step three, decide what actually matters

Not every threat is equally important, and trying to fix everything at once is how startups waste effort. The next step is prioritisation, weighing each threat by how likely it is and how bad the impact would be.

A practical lens that works especially well for AI agents is reversibility. A threat that leads to an irreversible action, customer data exfiltrated, a destructive write, a wrong payment, matters more than one that produces a recoverable error, even if the recoverable one is more likely. Rank your threats by combining likelihood with impact, give extra weight to anything irreversible or anything touching customer data, and you have a prioritised list that tells you where to spend your limited time first.

Step four, decide what to do about each

For each threat that matters, you have a small set of options, and naming them explicitly is part of the discipline. You can mitigate it, by adding a control such as least privilege on a tool, input and output handling, tenant isolation, or an approval checkpoint. You can eliminate it, by removing the risky capability entirely, for example dropping a dangerous tool the agent does not really need. You can accept it, consciously, where the risk is genuinely low and the cost of mitigation is disproportionate. What you must not do is leave it unexamined.

The output of this step is the thing that makes the whole exercise worthwhile, a clear list of the threats to your AI agent, ranked, each with a decision and, where mitigated, a specific control. That document is both your security roadmap and, not coincidentally, most of what you need to answer an enterprise security review with confidence.

Step five, validate that you did enough

The final question is whether your model and your mitigations actually hold. This is where you test, ideally by attempting the attacks you identified, to confirm your controls work rather than assuming they do. Threat modelling tells you where to look. Testing confirms whether the defence is real. The two together are what genuine security looks like, as opposed to a documented intention that has never been checked.

It is also not a one time exercise. Your agent will change, you will add tools and capabilities, and the threats will evolve, so the model is a living document you revisit as the product grows, not a thing you do once and file away.

Why this puts you ahead of procurement

Here is the strategic payoff. When an enterprise buyer's security team assesses your AI product, they are, in effect, threat modelling it. If you have already done it yourself, several things become true. You already know what they will find, because you found it first. You have already addressed the serious items, so there are fewer surprises. And you can answer their questions with specifics, here is our attack surface, here is what we identified, here is what we did, which is enormously more convincing than vague assurances.

In other words, threat modelling your agent before procurement does it for you turns the security review from an exam you fear into a conversation you are prepared for. It is the difference between being assessed and being ready.

The honest takeaway

Threat modelling an AI agent is not reserved for large companies with security teams. It is a structured way of thinking, map the system, find the AI-specific threats, prioritise by likelihood and irreversibility, decide what to do about each, and validate that it holds, that any technical founder can apply. Doing it deliberately, before an enterprise buyer does it for you, is one of the highest leverage security activities available to an AI startup, because it both genuinely secures your product and prepares you for the review that decides your biggest deals.

The startups that struggle in procurement are the ones who never mapped their own attack surface. The ones that sail through are the ones who did it first.

Want your AI agent's attack surface mapped before a buyer does it?

Book a free review and we'll threat model your AI product with you, and show you what procurement will probe.

Get started

More insights, delivered monthly

Get the latest insights on AI security and compliance.

Solutions

Consulting & Advisory

Engineering & Delivery

Industry

Marketing & Sales AI

E-Commerce AI

FinTech AI

Company

About us

Careers

Knowledge

Resources

Insights

AI agent threat modelling: how to map attack surfaces before enterprise procurement asks

What threat modelling actually is

Step one, map what you are building

Step two, identify what can go wrong, the AI-specific threats

Step three, decide what actually matters

Step four, decide what to do about each

Step five, validate that you did enough

Why this puts you ahead of procurement

The honest takeaway

Want your AI agent's attack surface mapped before a buyer does it?

AI Security Insights

AI agent threat modelling: how to map attack surfaces before enterprise procurement asks

How to prevent PII leaking into your LLM API calls (a practical guide for AI startups)

How to secure a LangChain agent before your first enterprise demo

AI security tools for startups compared. Mindgard, Noma, Giskard, and CYBNODE.

More insights, delivered monthly