Technical writeups

The problem The format Portability Discovery Loading Selection Design Maintenance

Technical overview

Knowledge agents

How COSTA separates what it always needs to know from what depends on the task — and why keeping those two things apart makes every response sharper.

01 · The problem

Relevance is narrow. Context windows are not.

Every prompt sent to the model has a cost: time, money, and attention. The more content is in the context window, the more the model has to hold, weigh, and reason about before producing a response. Some of that content is always relevant. Most of it is relevant only sometimes.

The naive approach is to put everything in every prompt. Team structure, career history, writing style, meeting notes, task list, calendar — all of it, every time. This works until it doesn't. The signal-to-noise ratio degrades. Responses drift toward the generic. Latency climbs. The model is spending its attention on context that isn't helping.

The alternative — putting nothing in and relying on the model to ask for what it needs — is slower and less reliable. The model may not know what it doesn't know. A brief that should sound like you won't, if the writing style context wasn't there to begin with.

Knowledge agents are the solution to this tradeoff. They separate domain-specific context from the base prompt and provide a structured mechanism for loading it precisely when and where it matters.

02 · The format

A directory, a manifest, and files

A knowledge agent is not a special data structure. It is a directory inside context/agents/, containing an optional manifest and one or more markdown files. That is the entire format.

The manifest is a small JSON file named agent.json. Its only required field is name — the human-readable label used when the app indexes the agent and when the model reports which agents it consulted. Without a manifest the directory name is used as the agent name, capitalised. Either way, the agent is valid.

The markdown files hold the actual knowledge. Each file is scoped to a specific topic within the agent's domain. The Profile agent separates career background from writing style from writing samples. The Organisation agent holds team structure and relationship context in a single file. The scope of each file is a design decision: narrow enough to be loaded selectively, broad enough to be self-contained.

Agent directory structure

context/agents/

Agent root — one subdirectory per agent

  profile/

Profile Knowledge Agent

    agent.json

{ "name": "Profile Knowledge Agent" }

    career.md

Role, background, professional trajectory

    writing-style.md

Voice, vocabulary, rhythm, what to avoid

    writing-samples.md

Reference examples of good output

  organisation/

Organisation Knowledge Agent

    agent.json

{ "name": "Organisation Knowledge Agent" }

    people.md

Team structure, reporting lines, key relationships

This structure was chosen for its simplicity. There is no schema, no embedding pipeline, no vector database to maintain. New agents are created by adding a directory. New context is added by editing a file. The format is legible to any text editor and requires no tooling beyond one.

03 · Portability

Portable context stores

The knowledge you accumulate about how you work — your voice, your team, your domain — does not belong to the application. It belongs to you. Knowledge agents are built to reflect that.

Every agent is a directory of plain text files on your local machine. They are not stored in a proprietary format, synced to a cloud service, or locked inside a database. They live wherever your app root lives. You can open them in any editor, search them with any tool, and version them with git. If you stop using COSTA tomorrow, the files remain exactly as they were. Nothing is lost.

Format

Plain markdown

readable in any editor, searchable with any tool

Location

Local filesystem

nothing leaves your machine unless you push it

Editing

Any text editor

no import process, no special tooling required

Versioning

Git-compatible

diff, branch, and revert like any other file

This portability has a practical consequence. Knowledge agents can be built up gradually, edited by hand, and maintained like any other working document. A writing style file is only as good as the last time it was updated. An org structure file becomes stale after a reorg. Plain text files are low-friction to maintain precisely because there is no special workflow required. Open the file, make the change, save.

Because agents are files, they also travel. Commit them to the same repository as your notes and they are versioned alongside everything else. Move the repository to a new machine and the agents come with it. Share a specific agent file with a colleague who is configuring their own setup. The format imposes no barriers on any of these things.

04 · Discovery

How the app finds agents

Agents are discovered automatically at request time. When a system prompt is being assembled, the app scans the context/agents/ directory, reads each subdirectory, and builds a structured index. No configuration is required to register a new agent. Adding the directory is enough.

For each agent, the app reads the manifest if present, lists the .md files alphabetically, and formats both into a block that goes into the system prompt. The block tells the model what each agent is called, what files it contains, and what those files cover — enough for the model to decide whether and what to load.

A new agent directory added to context/agents/ will appear in the next request's system prompt without any restart or rebuild. The app does not cache the agent index between requests. Discovery happens fresh each time, so the index always reflects the current state of the filesystem.

What the model sees · agent index in the system prompt

**Profile Knowledge Agent**
- `context/agents/profile/career.md` — career
- `context/agents/profile/writing-style.md` — writing style
- `context/agents/profile/writing-samples.md` — writing samples

**Organisation Knowledge Agent**
- `context/agents/organisation/people.md` — people

Generated dynamically — reflects whatever directories currently exist under context/agents/

05 · Loading

Two loading strategies

Knowing what agents exist and deciding what to load are separate concerns. Discovery puts the index into the system prompt. Loading — actually reading the file content into context — happens through one of two mechanisms, depending on what generated the request.

On demand

Chat interface

01

System prompt includes agent index + required-reads rules

02

Model reads task, identifies which agents apply

03

Model calls read_file for each relevant file

04

Content enters conversation context mid-loop

05

Model reports which agents it consulted

Model-directed · correct for general questions where relevance varies

Pre-loaded

Brief · prep · meeting prep

01

Server identifies which agents are always needed for this task type

02

Agent files are read and injected into the system prompt before the loop starts

03

Model receives full agent context on the first turn

04

No read_file round-trip required

05

Guaranteed in context regardless of model behaviour

Server-directed · correct when the same agents are always needed

Context precision

The right files loaded for each task.
Not everything. Not nothing.

COSTA

Morning Brief

pre-loaded

Meeting Prep

pre-loaded

Chat: Draft

on demand

Chat: Org Q

on demand

Chat: General

model decides

Profile

career.md

Career background

writing-style.md

Writing style

writing-samples.md

Writing samples

Organisation

people.md

Org structure

loaded into context

not loaded

costawork.app/articles/knowledge-agents

On demand, for chat

In the chat interface, agent loading is model-directed. The system prompt contains the full agent index plus a set of explicit rules about which agents are required for which types of task. The model reads these rules, assesses the incoming question, and calls read_file for any agent files it determines are relevant. The content comes back as a tool result and enters the conversation context. The loop continues with that knowledge now in scope.

This approach is correct for general chat because relevance varies by question. A question about how to phrase a difficult message requires writing style context. A question about who owns a particular workstream requires org context. A question about the task list requires neither. Loading everything for every chat turn would be wasteful and noisy. The model makes the call based on what the question actually demands.

Pre-loaded, for structured generation

For briefs, 1-on-1 prep, and meeting prep, relevant agent files are injected directly into the system prompt before the agentic loop starts. The server knows which agents are always required for each task type and reads those files at assembly time. By the time the model processes its first turn, the context is already there.

Pre-loading is appropriate here because the required agents are predictable and constant. Every brief needs writing style. Every prep synthesis needs org structure. Making the model fetch these via tool calls on every run would be slower, more expensive, and less reliable — the model might skip the read in a long loop where it already feels it has enough context. Pre-loading removes that uncertainty.

Even when files are pre-loaded, the model is told they are already in context and instructed not to read them again. This prevents a common failure mode where the model redundantly re-fetches context it already has.

06 · Selection

How COSTA decides what to load

For on-demand loading, the decision logic is not in code. It is in the system prompt itself. The agent index section includes explicit rules, written in plain text, that tell the model when each category of agent is required. These rules are part of every chat system prompt.

Required reads · injected into every system prompt

Writing in Maya's voice · drafting a message, email, or document

Profile Agent → writing-style.md (minimum)

Any question involving team members, direct reports, peers, or org dynamics

Organisation Agent → people.md

Any question involving design principles or Preply design context

Design Agent → relevant files

Rules are in plain text in the system prompt — the model is told what to read, not given it automatically

This is an important design choice. The selection rules are not hidden in application logic — they are visible, editable text in a prompt that the model reads. If a rule is wrong or missing, you can correct it directly. Adding a new agent category means adding a new rule to the same section. The mechanism stays the same; only the content changes.

The rules use language like "at minimum" to specify mandatory files while leaving room for the model to load additional files based on its own assessment. A task involving a specific person might load people.md from the Organisation agent and, if available, a dedicated context file for that individual from context/. The rules establish a floor. Relevance determines what else gets loaded above it.

After producing a response, the model is asked to report which agents it consulted. This appears as a single line at the end of each reply: Agents used: Profile Knowledge Agent, Organisation Knowledge Agent. It serves two purposes: it confirms the selection decision was made, and it tells you which context was active when a particular output was produced.

07 · Design

What makes a good agent

The format imposes almost no constraints. A knowledge agent can hold one file or twenty, cover a narrow topic or a wide one, be updated daily or once a year. The lack of constraints makes the design decisions real ones.

Keep scope narrow

An agent is loaded in full when it is relevant. A large, general-purpose agent that covers multiple unrelated domains gets loaded for any task that touches any one of them, bringing the rest along unnecessarily. Separate agents with narrow scope can be loaded independently. A question about team structure does not need to pull in career history.

The same principle applies within an agent. One file per topic is the right unit. Writing style is a file. Writing samples are a separate file. Mixing them would make selective loading harder and the content harder to maintain. When a file is split by topic, updating one section does not require touching the others.

Maintain it like a living document

An agent is only as good as the last time it was updated. A writing style file written once and never touched will drift from how you actually write. An org structure file that still lists someone who left two months ago introduces noise. The lightweight format is specifically designed to lower the cost of keeping agents current. If updating requires a special workflow, it will not happen. If it requires opening a file and editing text, it will.

Be specific, not comprehensive

The temptation is to make an agent exhaustive: include everything that might ever be relevant, in case the model needs it. This is the wrong direction. A writing style file that includes every stylistic rule you have ever thought of is harder to reason about than one that names five things that matter most. Specificity is what gives context its value. Generic guidance produces generic output. The goal is not to document the full domain. It is to give the model the fewest facts that most reliably shift its behaviour in the right direction.

08 · Maintenance

The assistant can help maintain itself

The same assistant that reads knowledge agents can also help write them. This is not a secondary feature — it is one of the more useful things the system enables.

Meeting transcripts, 1-on-1 notes, and decision logs accumulate patterns that are rarely made explicit anywhere. How someone tends to frame problems. What kinds of tradeoffs they consistently favour. Which communication contexts they find draining versus energising. The vocabulary they reach for when thinking carefully versus when thinking fast. None of this lives in a knowledge agent by default, because no one sat down and wrote it out. It exists only in aggregate, across dozens of conversations.

The assistant can extract it. Ask it to read six months of 1-on-1 notes and identify recurring decision-making patterns. Ask it to synthesise how you give feedback based on the transcripts where you have done it. Ask it what it would find useful to know about you that is not currently in the profile agent. The output of those conversations is the raw material for agent updates — and in most cases, the assistant can draft the update directly.

Prompts that produce agent updates

"What patterns do you notice in how I make decisions, based on my meeting notes from the last quarter?"

Synthesises from transcripts → new section in career.md or a dedicated decisions.md

"Read my last 10 1-on-1s and tell me what communication preferences seem consistent across them."

Extracts signal → draft additions to writing-style.md

"What frameworks or references do I keep reaching for? What should be documented for future context?"

Surfaces recurring mental models → new references.md file

"What would you find useful to know about me that isn't in the profile agent?"

Identifies gaps → targeted questions to fill them

This loop matters because it changes how agents accumulate value over time. A knowledge agent written once and left alone gets stale. One that is periodically reviewed, extended, and refined through conversation with the assistant gets sharper. The assistant surfaces what is missing. You decide what to add. The file is updated. The next response is better calibrated.

The format supports this. Because agents are plain text files, the assistant can propose an edit in a message and you can apply it directly. There is no import process, no approval workflow, no schema to conform to. The feedback loop between "what the assistant knows" and "what it could know" stays short enough to act on.

Also in this series

Below the Waterline →Organisational Memory →Context Architecture →Brief Generation →Colour System →

COSTA · Knowledge Agents← Back to app