How COSTA assembles what it knows, how that knowledge shapes every response, and how the things it creates feed back into what it knows next time.
Every generation starts with a system prompt: the briefing document the model reads before producing anything. It contains what the model needs to understand the situation: who you are, what you're working on, what's in your calendar, what you've written down recently.
This is assembled fresh on every request. Nothing is stored inside the model between sessions. The underlying AI has no persistent memory of its own. COSTA compiles the relevant context from your files and injects it at the start of each turn. The model reads the same briefing every time, updated to the current state of your data.
The stack is ordered deliberately. Behavioural rules come first, so COSTA knows how to act before it reads anything else. Knowledge agent instructions follow, telling it which agents exist and when to consult them. Then the operational content: tasks, calendar, journal, saved Slack messages, and a file index pointing to notes it can read on demand.
Not everything can fit in a single prompt, and fitting everything in would be wasteful. The file index is just that: an index. It tells COSTA what exists and where, but the content stays on disk until it is needed.
COSTA's context is not a database. It's a directory on your local machine: a structured collection of plain text files that you own completely and can edit in any tool you like. The application reads from and writes to this directory, but the data never leaves your computer unless you explicitly push it somewhere.
The directory layout reflects how information is used. Notes about people live alongside the person's name. Meeting notes live by date. Generated briefs accumulate in a dedicated folder, both for persistence and so future generations can reference them. Everything is markdown: readable without the app, searchable with any text tool, and portable to whatever comes next.
Each directory has a specific role in the context assembly process. The 1-1s/ and meeting-notes/ directories form the primary note corpus. The most recent entries from each are indexed in every system prompt, with the full content available on demand. The briefs/ directory lets COSTA see what it wrote yesterday before drafting today's brief, avoiding repetition. The context/agents/ directory is where knowledge agents live, each in its own subfolder with a manifest and one or more topic files.
Because everything is plain files, the filesystem is also the integration layer. Drop a transcript into meeting-notes/, add a LinkedIn export to context/linkedin/, paste a document into documents/. No import process. The file index in the next system prompt will include it.
Once context is assembled and a task is given, COSTA runs an agentic loop: a cycle of calls between the model and your filesystem that continues until the model decides it has enough to work with.
During each turn, COSTA can call tools: reading a specific file, listing the contents of a directory, or writing a new document. The result comes back, gets added to the conversation, and the loop continues. This is how COSTA reads your 1-on-1 notes before writing a synthesis, or checks recent meeting notes before drafting a brief. Context is fetched on demand, not pre-loaded.
The loop runs up to fifteen iterations and handles four tool types: read_file, list_files, write_file, and delete_file. All file access is sandboxed to the app root. The model cannot read or write anywhere else on your system.
When the model decides it has enough information and generates its final response, the loop ends and the output is streamed directly to the screen. You see tokens as they arrive, not after the whole response is finished.
The chat interface and the structured generators (brief, prep, meeting prep) all use exactly the same underlying loop. The difference is in what they start with: structured generators receive a specific task instruction and have key agent files pre-loaded. Chat starts with the full context and whatever you type. Both converge on the same agentic pattern.
COSTA can produce two broad categories of output. The first is the set of predefined artefact types: morning brief, end-of-day brief, 1-on-1 prep synthesis, and meeting prep. These have dedicated generation routes, structured task prompts, and caching logic. They are triggered from specific points in the interface and saved to known locations in the filesystem.
The second category is open-ended. During any chat session, COSTA can use its write_file tool to create any document it judges useful: an email draft in response to a message you've pasted, a reply to a Slack thread, a performance review section, a job description, a document outline, a meeting summary. These aren't triggered by a button. They emerge from conversation.
When COSTA writes a file during chat, it appears immediately in the relevant section of the app. A new document in documents/ shows up in the file index; a new meeting note appears in the notes list. Because everything is plain files, there's no separate save step, no format conversion, no export. Writing and storing are the same action.
This is also where the distinction between drafting and finalising matters. COSTA can produce a complete email draft and save it to documents/, but sending it remains your action. The app operates on a strict principle: COSTA drafts, you decide. Nothing goes external without you.
Ad-hoc artefacts don't live outside the agent system. Before writing anything in your name, COSTA reads the Profile knowledge agent's writing style file. That file describes your vocabulary, tone, sentence rhythm, what to avoid, and what good looks like. The same voice guidelines that shape your briefs shape everything else COSTA writes for you.
The architecture's most useful property is the feedback loop. Everything COSTA writes or helps you write can become context for the next thing it generates.
When a morning brief is generated, it's written to disk. When a 1-on-1 note is taken, it's stored in the notes directory. When you ask COSTA to draft a document, it can read previous documents from the same project. The file index in every system prompt points at all of this. As your library of notes and documents grows, the model's ability to produce relevant, contextualised output grows with it.
This is why note-taking matters more than it might seem. A 1-on-1 note from six months ago is context for today's prep synthesis. A meeting note from last quarter is context for this quarter's planning. The model has no memory of its own. The filesystem does, and COSTA reads from it every time.
As context grows, some of it becomes stable and domain-specific: a consistent voice, an area of expertise, a body of reference material. This kind of knowledge doesn't change daily. It belongs in a dedicated store, not mixed into a general notes directory.
Each knowledge agent covers a specific domain: a curated store of context for that area. They function as specialist knowledge stores that COSTA consults when a task calls for them. COSTA knows what each agent covers, when to consult it, and how to incorporate that knowledge into a response.
Each agent is a folder containing one or more markdown files, plus a short manifest that names the agent and describes what it knows. COSTA discovers agents automatically. Adding an agent is as simple as creating a folder with the right structure. Agents can be created, edited, and managed directly from within the app.
The key design principle is specificity. A single large file containing everything about your team is harder to reason about than separate files for structure, individual notes, and relationships. Breaking knowledge into topic-scoped files lets COSTA load only what's relevant to a given task, rather than consuming the entire store every time.
For chat, agents are loaded on demand. The system prompt tells COSTA which agents exist and instructs it to read the relevant files before answering any question that depends on them. It's the model's responsibility to identify when a knowledge agent is needed and fetch its files.
For structured generation (briefs, 1-on-1 prep, meeting prep) the relevant agents are pre-loaded directly into the system prompt before the loop starts. Brief generation always needs writing style and team context. Prep synthesis always needs org structure. Pre-loading eliminates the round-trip and guarantees the context is present, regardless of whether the model would have fetched it otherwise.
In both cases, COSTA reports which agents it consulted at the end of each response. This gives you visibility into what knowledge was active when a particular output was produced.
Every token in a prompt costs time and money. The context architecture is built to stay useful without becoming expensive. A few decisions drive this.
The journal loads today's entry in full, but older entries are previewed at three lines each. Three lines signals recency without displacing more relevant material. Saved Slack messages are filtered to the last fourteen days; anything older is unlikely to be actionable. The file index shows the most recent fifteen entries in each category. The full archive is always reachable via a tool call.
Pre-loading agent files for structured tasks eliminates tool-call round-trips for context that is always needed. For chat, the question determines what's relevant. On-demand loading handles this better: there is no point loading organisation context for a question about writing style.
In a long chat session, every previous message gets included in the context sent to the API on each turn. Without intervention, this grows without bound. After twelve messages, older exchanges are compressed into a single paragraph by a smaller model running in parallel. The most recent messages stay verbatim. Earlier context is replaced by the summary. Conversations can run as long as needed.
All structured generation paths use the same model. Costs vary with context size and the number of tool calls made to read additional files. The numbers below reflect typical runs. Heavier 1-on-1 notes or longer task lists push toward the upper end.
| Task | Input tokens | Output tokens | Est. cost |
|---|---|---|---|
| Morning brief | 6,000–8,000 | 600–900 | $0.03–0.05 |
| EOD brief | 5,000–7,000 | 500–700 | $0.03–0.04 |
| 1-on-1 prep | 8,000–12,000 | 800–1,200 | $0.04–0.06 |
| Meeting prep | 7,000–10,000 | 600–1,000 | $0.03–0.05 |
| Chat turn (no tools) | 3,000–5,000 | 300–500 | $0.01–0.02 |
| Chat turn (w/ tools) | 5,000–10,000 | 400–600 | $0.02–0.04 |
| Summarisation | 2,000–4,000 | 150–300 | <$0.01 ¹ |
¹ Summarisation uses a smaller, faster model at roughly one-quarter the input price. All other rows reflect Sonnet (latest) at $3/MTok input · $15/MTok output.
A full working day: one morning brief, one EOD brief, three 1-on-1 preps, two meeting preps, around fifteen chat turns. That comes to roughly 150,000–200,000 input tokens and 12,000–18,000 output tokens.
Exact energy figures for cloud AI inference aren't published, so these are estimates based on published research. At roughly 0.2–0.5 Wh per thousand tokens (a range that accounts for model size, hardware generation, and data centre efficiency) a typical COSTA day consumes between 40 and 100 Wh at the server.
To calibrate: a cup of tea uses about 30 Wh, charging a smartphone around 15 Wh, running a laptop for an hour somewhere between 30 and 80 Wh depending on load. A working day of COSTA is in the same order of magnitude as leaving a laptop on for an hour or two. Over a 22-day working month, that's roughly 1–2 kWh. A typical household uses more than that in a day.
The context trimming and summarisation described above have a real effect here, not just on cost. Shorter prompts mean fewer tokens processed per call, which means less compute and less energy per task.