COSTA
Technical writeups
Organisational MemoryContext ArchitectureBrief GenerationColour System
AssemblyFilesystemThe loopArtefactsFeedbackAgentsEfficiency
Technical overview

Context architecture

How COSTA assembles what it knows, how that knowledge shapes every response, and how the things it creates feed back into what it knows next time.

01 · The foundation

Assembling context

Every generation starts with a system prompt: the briefing document the model reads before producing anything. It contains what the model needs to understand the situation: who you are, what you're working on, what's in your calendar, what you've written down recently.

This is assembled fresh on every request. Nothing is stored inside the model between sessions. The underlying AI has no persistent memory of its own. COSTA compiles the relevant context from your files and injects it at the start of each turn. The model reads the same briefing every time, updated to the current state of your data.

System prompt · every request
Behavioural rules
Agent.md · tone, task format, dispatch rules
Knowledge agents index
Dynamic list of available agents + mandatory preflight instructions
Current tasks
tasks.md · today / this week / backlog
This week's calendar
Events, 1-on-1s, linked people
Journal
Today's entry in full · recent entries previewed
Saved Slack messages
Last 14 days · one line each
File index
1-on-1 notes, meeting notes, briefs · most recent 15 each

The stack is ordered deliberately. Behavioural rules come first, so COSTA knows how to act before it reads anything else. Knowledge agent instructions follow, telling it which agents exist and when to consult them. Then the operational content: tasks, calendar, journal, saved Slack messages, and a file index pointing to notes it can read on demand.

Not everything can fit in a single prompt, and fitting everything in would be wasteful. The file index is just that: an index. It tells COSTA what exists and where, but the content stays on disk until it is needed.

02 · Storage

The filesystem

COSTA's context is not a database. It's a directory on your local machine: a structured collection of plain text files that you own completely and can edit in any tool you like. The application reads from and writes to this directory, but the data never leaves your computer unless you explicitly push it somewhere.

The directory layout reflects how information is used. Notes about people live alongside the person's name. Meeting notes live by date. Generated briefs accumulate in a dedicated folder, both for persistence and so future generations can reference them. Everything is markdown: readable without the app, searchable with any text tool, and portable to whatever comes next.

App rootall paths relative to this root
tasks.md
Master task list · today, this week, backlog
Agent.md
Behavioural rules injected into every prompt
context/agents/
Knowledge agents · one subdirectory per agent
1-1s/
1-on-1 meeting notes · one file per person per session
meeting-notes/
General meeting notes and transcripts
briefs/
Generated morning and end-of-day briefs
journal/
Daily entries and mood log
documents/
Longer-form drafts, reviews, written artefacts
notes/
General notes and impact logs
weekly-synthesis/
Weekly review documents

Each directory has a specific role in the context assembly process. The 1-1s/ and meeting-notes/ directories form the primary note corpus. The most recent entries from each are indexed in every system prompt, with the full content available on demand. The briefs/ directory lets COSTA see what it wrote yesterday before drafting today's brief, avoiding repetition. The context/agents/ directory is where knowledge agents live, each in its own subfolder with a manifest and one or more topic files.

Because everything is plain files, the filesystem is also the integration layer. Drop a transcript into meeting-notes/, add a LinkedIn export to context/linkedin/, paste a document into documents/. No import process. The file index in the next system prompt will include it.

03 · Generation

The agentic loop

Once context is assembled and a task is given, COSTA runs an agentic loop: a cycle of calls between the model and your filesystem that continues until the model decides it has enough to work with.

During each turn, COSTA can call tools: reading a specific file, listing the contents of a directory, or writing a new document. The result comes back, gets added to the conversation, and the loop continues. This is how COSTA reads your 1-on-1 notes before writing a synthesis, or checks recent meeting notes before drafting a brief. Context is fetched on demand, not pre-loaded.

Agentic loop · simplified
System prompt
assembled context
COSTA
reads, reasons, acts
Tool call
read_file · list_files
Response
streamed to screen
↩ tool result returned · loop continues back to COSTA

The loop runs up to fifteen iterations and handles four tool types: read_file, list_files, write_file, and delete_file. All file access is sandboxed to the app root. The model cannot read or write anywhere else on your system.

When the model decides it has enough information and generates its final response, the loop ends and the output is streamed directly to the screen. You see tokens as they arrive, not after the whole response is finished.

Chat is the same loop

The chat interface and the structured generators (brief, prep, meeting prep) all use exactly the same underlying loop. The difference is in what they start with: structured generators receive a specific task instruction and have key agent files pre-loaded. Chat starts with the full context and whatever you type. Both converge on the same agentic pattern.

04 · Creation

Writing artefacts

COSTA can produce two broad categories of output. The first is the set of predefined artefact types: morning brief, end-of-day brief, 1-on-1 prep synthesis, and meeting prep. These have dedicated generation routes, structured task prompts, and caching logic. They are triggered from specific points in the interface and saved to known locations in the filesystem.

The second category is open-ended. During any chat session, COSTA can use its write_file tool to create any document it judges useful: an email draft in response to a message you've pasted, a reply to a Slack thread, a performance review section, a job description, a document outline, a meeting summary. These aren't triggered by a button. They emerge from conversation.

What COSTA can write
Structured artefacts
· Morning brief
· End-of-day brief
· 1-on-1 prep synthesis
· Meeting prep
Ad-hoc via chat
· Email and message drafts
· Performance review content
· Document outlines and drafts
· Meeting summaries
· Job descriptions
· Any other written artefact

When COSTA writes a file during chat, it appears immediately in the relevant section of the app. A new document in documents/ shows up in the file index; a new meeting note appears in the notes list. Because everything is plain files, there's no separate save step, no format conversion, no export. Writing and storing are the same action.

This is also where the distinction between drafting and finalising matters. COSTA can produce a complete email draft and save it to documents/, but sending it remains your action. The app operates on a strict principle: COSTA drafts, you decide. Nothing goes external without you.

Writing in your voice

Ad-hoc artefacts don't live outside the agent system. Before writing anything in your name, COSTA reads the Profile knowledge agent's writing style file. That file describes your vocabulary, tone, sentence rhythm, what to avoid, and what good looks like. The same voice guidelines that shape your briefs shape everything else COSTA writes for you.

05 · The cycle

Content becomes context

The architecture's most useful property is the feedback loop. Everything COSTA writes or helps you write can become context for the next thing it generates.

When a morning brief is generated, it's written to disk. When a 1-on-1 note is taken, it's stored in the notes directory. When you ask COSTA to draft a document, it can read previous documents from the same project. The file index in every system prompt points at all of this. As your library of notes and documents grows, the model's ability to produce relevant, contextualised output grows with it.

The feedback cycle
01
You act
Meetings happen. Notes get taken. Tasks are created and completed.
02
COSTA writes
Briefs, prep syntheses, drafted documents go to disk.
03
Context grows
The file index expands. Future prompts can reference past outputs.
04
Output improves
Richer context produces more specific, better-informed responses.

This is why note-taking matters more than it might seem. A 1-on-1 note from six months ago is context for today's prep synthesis. A meeting note from last quarter is context for this quarter's planning. The model has no memory of its own. The filesystem does, and COSTA reads from it every time.

06 · Specialisation

Knowledge agents

As context grows, some of it becomes stable and domain-specific: a consistent voice, an area of expertise, a body of reference material. This kind of knowledge doesn't change daily. It belongs in a dedicated store, not mixed into a general notes directory.

Each knowledge agent covers a specific domain: a curated store of context for that area. They function as specialist knowledge stores that COSTA consults when a task calls for them. COSTA knows what each agent covers, when to consult it, and how to incorporate that knowledge into a response.

COSTAteam lead · orchestratordelegates to →
Profile
Voice & background
Writing style
Organisation
Team structure
Reporting lines
Domain
Specialist knowledge
Reference material
on-demand for chat · pre-loaded for brief, prep and meeting prep

Each agent is a folder containing one or more markdown files, plus a short manifest that names the agent and describes what it knows. COSTA discovers agents automatically. Adding an agent is as simple as creating a folder with the right structure. Agents can be created, edited, and managed directly from within the app.

The key design principle is specificity. A single large file containing everything about your team is harder to reason about than separate files for structure, individual notes, and relationships. Breaking knowledge into topic-scoped files lets COSTA load only what's relevant to a given task, rather than consuming the entire store every time.

When agents are loaded

For chat, agents are loaded on demand. The system prompt tells COSTA which agents exist and instructs it to read the relevant files before answering any question that depends on them. It's the model's responsibility to identify when a knowledge agent is needed and fetch its files.

For structured generation (briefs, 1-on-1 prep, meeting prep) the relevant agents are pre-loaded directly into the system prompt before the loop starts. Brief generation always needs writing style and team context. Prep synthesis always needs org structure. Pre-loading eliminates the round-trip and guarantees the context is present, regardless of whether the model would have fetched it otherwise.

In both cases, COSTA reports which agents it consulted at the end of each response. This gives you visibility into what knowledge was active when a particular output was produced.

07 · Optimisation

Keeping it efficient

Every token in a prompt costs time and money. The context architecture is built to stay useful without becoming expensive. A few decisions drive this.

Trimming the base context

The journal loads today's entry in full, but older entries are previewed at three lines each. Three lines signals recency without displacing more relevant material. Saved Slack messages are filtered to the last fourteen days; anything older is unlikely to be actionable. The file index shows the most recent fifteen entries in each category. The full archive is always reachable via a tool call.

Pre-loading vs on-demand

Pre-loading agent files for structured tasks eliminates tool-call round-trips for context that is always needed. For chat, the question determines what's relevant. On-demand loading handles this better: there is no point loading organisation context for a question about writing style.

Conversation summarisation

In a long chat session, every previous message gets included in the context sent to the API on each turn. Without intervention, this grows without bound. After twelve messages, older exchanges are compressed into a single paragraph by a smaller model running in parallel. The most recent messages stay verbatim. Earlier context is replaced by the summary. Conversations can run as long as needed.

Cost per task type

All structured generation paths use the same model. Costs vary with context size and the number of tool calls made to read additional files. The numbers below reflect typical runs. Heavier 1-on-1 notes or longer task lists push toward the upper end.

TaskInput tokensOutput tokensEst. cost
Morning brief6,000–8,000600–900$0.03–0.05
EOD brief5,000–7,000500–700$0.03–0.04
1-on-1 prep8,000–12,000800–1,200$0.04–0.06
Meeting prep7,000–10,000600–1,000$0.03–0.05
Chat turn (no tools)3,000–5,000300–500$0.01–0.02
Chat turn (w/ tools)5,000–10,000400–600$0.02–0.04
Summarisation2,000–4,000150–300<$0.01 ¹

¹ Summarisation uses a smaller, faster model at roughly one-quarter the input price. All other rows reflect Sonnet (latest) at $3/MTok input · $15/MTok output.

Typical daily total

A full working day: one morning brief, one EOD brief, three 1-on-1 preps, two meeting preps, around fifteen chat turns. That comes to roughly 150,000–200,000 input tokens and 12,000–18,000 output tokens.

Input tokens
~175k
per day
Output tokens
~15k
per day
Estimated cost
$0.45–0.75
per day
Per month
$10–16
22 working days

Energy

Exact energy figures for cloud AI inference aren't published, so these are estimates based on published research. At roughly 0.2–0.5 Wh per thousand tokens (a range that accounts for model size, hardware generation, and data centre efficiency) a typical COSTA day consumes between 40 and 100 Wh at the server.

To calibrate: a cup of tea uses about 30 Wh, charging a smartphone around 15 Wh, running a laptop for an hour somewhere between 30 and 80 Wh depending on load. A working day of COSTA is in the same order of magnitude as leaving a laptop on for an hour or two. Over a 22-day working month, that's roughly 1–2 kWh. A typical household uses more than that in a day.

The context trimming and summarisation described above have a real effect here, not just on cost. Shorter prompts mean fewer tokens processed per call, which means less compute and less energy per task.

Also in this series
Organisational MemoryBrief GenerationColour System
COSTA · Context Architecture← Back to app