A reasonable summary of how most teams build AI outbound today goes like this: write a system prompt that describes your company, drop in a bullet list of personas, ask the model to write personalized emails. Iterate on the prompt. If a competitor pulls ahead, rewrite the prompt. If the emails feel generic, add more bullets.
This works at small scale. It does not compound.
I want to walk through why, and what we built instead. The short version is that prompts don’t carry weight at scale because they’re flat: one document, edited as a unit, where every change risks every email. What actually compounds is a context graph — typed entities, with explicit references between them, that the system traverses to assemble a brief for each prospect. Same product, same writer model, very different output and very different moat.
What “context” usually looks like
If you crack open most AI outbound tools, the “context” layer is one of two things. Either it’s a big text field labeled “About your company” with a few paragraphs and bullets. Or it’s a slightly more structured version: a product description, a list of personas, a value-prop statement, maybe a short list of customer quotes. The model loads all of it into the system prompt on every call.
This is fine until it isn’t. Then it fails in predictable ways.
The product description becomes either too general (so emails sound like every other AI outbound) or too specific (so the writer name-drops features no prospect cares about). The persona list grows until it’s effectively a glossary, and the model picks one paragraph from it more or less at random. Customer proof either generalizes to “trusted by X-type companies” or gets stuffed into every email regardless of fit. The writer has all of it loaded and none of it organized.
Underneath all of that is a deeper issue: there’s no way to say “for this prospect, load proof A because it answers objection B that this persona typically raises.” That’s a graph query. A flat document can’t be queried.
A typed context graph
Our system has nine entity types. Each one is a separate file under contexts/<type>/<id>.md in a workspace’s content repo:
- ICP: firmographic + tech attributes that define the ideal account
- Persona: a buyer with a defined role, pain pattern, and goals
- JTBD (job-to-be-done): the outcome a buyer hires the product to achieve
- Signal: an observable external event that suggests buying intent
- Play: a coordinated motion that bundles a signal, a persona, and an angle
- Alternative: what the buyer would do instead (competitor, in-house, status quo)
- Objection: a predictable concern that surfaces in deals
- Proof: customer evidence that neutralizes a specific objection or validates a claim
- Insight: a contrarian point of view that reframes the buyer’s problem
Each file is markdown with YAML frontmatter. The frontmatter declares the entity’s type, an id, and a references array that lists other entities it connects to. A simplified example for a JTBD entry:
---
type: jtbd
id: ship-90-day-pilot
summary: Ship a working pilot within 90 days of kickoff
references:
- persona/cto
- proof/series-b-rollout
- objection/integration-risk
- insight/velocity-compounds
---
That references array isn’t decoration. The semantics of an edge are inferred from its endpoints: jtbd → persona means “this job matters to this persona.” jtbd → proof means “this proof validates the achievability of this job.” objection → proof means “this proof neutralizes this objection.” About a dozen such edge semantics emerge from the nine entity types.
What the graph unlocks
Typing the entities and connecting them lets the brief curator (the upstream LLM call that decides what to say about a given prospect) traverse the graph for context, instead of being handed everything.
For a specific prospect, the brief curator does something like this:
- Match the account to an
icp. If it matches, load that ICP file. - Match the prospect to a
persona. Load it. - Walk the persona’s
referencesto find connectedjtbd,objection, andinsightentries that apply. - For each loaded objection, walk to a matching
proof. - Surface up to five passages from those files in the brief as
relevant_excerpts.
The writer never sees the graph. It sees the brief: a structured object with anchors, credibility, persona cues, three proof excerpts, two objection cues, one insight. The writer’s job is to render. The strategic work (what to say to this prospect) happens in the graph traversal.
The writer composes from a brief, not a catalog. The catalog is the asset. The brief is the assembly.
Why this beats one big prompt
A flat system prompt is one document edited as a unit. Adding a new proof means rewriting the messaging section. Adding a new objection means deciding where to mention it. Two people editing at once means a merge conflict in prose. The unit of work is “rewrite the prompt,” which means the prompt is always either out of date or being argued about.
A typed graph is many small files. Adding a new proof means creating one new file (proof/series-c-migration.md) and listing its references (which objections it neutralizes, which JTBD it validates). The next campaign that touches a referencing objection automatically picks up the new proof. No one has to edit a prompt. The unit of work is “add an entity,” which is small, reviewable, and reversible.
Two teams can add entities in parallel without colliding, because each entity is its own file. A growth marketer can author proofs. A product marketer can author insights. A sales leader can author objections. They commit to the same context repo through their respective workflows and the graph absorbs the additions.
Compounding looks like this, structurally. Every new entity is wired into the graph by its references. Graph density increases. The brief curator has more material to pull from. Future briefs improve without anyone editing a prompt.
Three cache tiers and why they matter
There’s a cost story underneath all of this that’s easy to miss. The Anthropic API charges less for cached tokens than fresh ones, but only if the input is exactly the same byte-for-byte at the cache boundary.
A flat system prompt is a single block. If any part of it changes (even the persona-specific section), the whole prompt is a cache miss. At batch scale, this is a significant cost.
A typed context graph lets you split the prompt into three tiers:
- Static: content identical across every request in a campaign (product description, universal rules, messaging). Cached once, hit on every subsequent call.
- Dynamic: content that varies across a small set of values (which persona the prospect matches, which voice variant the experiment selected). Cached per value, hit on subsequent calls with the same value.
- Unique: content specific to this one prospect (the brief, the enrichment data). Never cached, lives in the user message.
The system assembles each request by walking these tiers in order, with cache breakpoints at the boundaries. Static product/messaging stays cached. Persona varies across a small set, so its variants stay cached. Unique enrichment goes in the user message. Cache hit rates on batch runs sit above ninety percent on the static portion.
Tier assignment is invisible to anyone writing context files. They just author markdown. The system handles the tier mapping based on how each entity is used in a workflow. None of this is possible with one flat prompt.
The brief curator pattern
The brief curator deserves its own paragraph because it’s where the graph meets the writer.
Most AI outbound systems have one LLM call: prompt + data → email. We have a pipeline. The digest reads the prospect’s profile and recent posts. The brief curator reads the digest, picks anchors, traverses the context graph, and produces a structured object. The writer reads the brief and renders three emails. A critic checks them. A rewriter, if needed, fixes them.
Brief curator output is intentionally narrow. It does not prescribe per-email structure. It does not include voice notes. It does not tell the writer “email 1 should ask X.” It surfaces material (anchors, credibility, cues, banned phrases, three excerpts from the matched catalog) and stops. The writer has compositional latitude because the brief leaves room for it.
This separation between curation and composition is the architectural move that makes the graph work. Curation walks the graph. Composition reads the brief. Switching voices means swapping a voice file, not retraining a writer. Switching plays means changing a references edge, not editing prompt instructions.
What this looks like operationally
Our context graph lives in a git repository, with each workspace getting its own repo. Editing happens through pull requests, reviewed before merge. The agent that generates emails pulls the latest version on every run.
This sounds like a lot of process for what was a free-text field a paragraph ago. In return, GTM knowledge becomes a versioned, reviewable, rollback-able asset that several people can contribute to in parallel. A repeated objection from a recent demo becomes a new entity. The next campaign that touches the persona it applies to picks it up automatically. A weak proof gets replaced, and its references still resolve, so nothing downstream breaks.
In a typical SaaS company, the closest analog is the way engineering teams treat their codebase. The code is the asset. PRs are how it changes. Reviewers catch regressions. CI verifies it still works. No engineer would tolerate “the codebase is one Google Doc that I rewrite when something is wrong.” It’s strange that most outbound teams tolerate exactly that shape for the system that writes their messages.
What this beats
Most teams converge instead on prompt iteration. Something feels off in the emails, so the prompt gets rewritten. Two months later the emails feel off again, so the prompt gets rewritten again. Each rewrite is a unit of work the previous rewrite gets discarded for. None of it persists.
That shape is also why prompt-engineering hires don’t scale outbound. The unit of work doesn’t compose. A clever sentence added to a system prompt makes one campaign better and risks every other campaign that prompt touches. Hiring three more people to write three more clever sentences doesn’t add three times the value; it adds drift.
Graphs add in the other direction. Three more people authoring entities (new proofs, sharper objections, contrarian insights) adds three times the material the curator can pull from. The graph absorbs the additions without anyone touching the prompts that already work.
This is the moat. Not the writer model. Not the prompt. The typed graph of GTM knowledge that compounds with use.
The diagnostic
If you want to see whether your AI outbound has any structural compounding in it, three questions to ask:
- Where does your GTM knowledge live? If the answer is “in the prompt” or “in the system instructions,” there’s no graph. The prompt is a flat document and every change risks every email it touches.
- What happens when a new objection or proof shows up? If the answer is “we update the prompt,” the unit of work doesn’t compose. If the answer is “we add an entity and the system picks it up,” it does.
- Can two people edit the context in parallel without colliding? If they can’t, it’s one document. If they can, it’s a graph. Only one of these scales.
Cold email written by AI will keep getting better at the surface level. Model improvements help. Prompt-engineering tricks help marginally. Teams that pull ahead at scale won’t be the ones with the cleverest prompt. They’ll be the ones whose GTM knowledge is structured the way a codebase is structured (typed, referenced, versioned, reviewable) and whose system traverses that structure instead of just reading a paragraph about the company.
The prompt is rented. The graph is owned.
From the library
- Context compounds, campaigns don't
The principle this piece shows the machinery for.
- Versioned context for outbound
How the context graph gets edited, reviewed, and rolled back.