Here is a thing that has happened to me, and I suspect to many of you, though perhaps you blamed yourself instead of the tool.
You are deep in a session with your AI coding agent. You have been going for an hour. It has read files, run commands, created branches, pushed commits, raised pull requests. You have watched it work. It has been impressive.
And then, somewhere around message 35 or push number 8, it types something like:
"I should note that I'm not sure which version control platform this repository uses. Could you confirm whether this is GitHub, GitLab, or Bitbucket?"
You stare at the screen.
It just pushed to GitHub. Twice. Using gh pr create. It wrote the pull request description. You merged it together. And now it is asking what platform you use.
The agent did not lose its mind. It lost its notes. There is a difference, and it matters.
Not Forgetting. Worse Than Forgetting.
The frustrating part is not that the context was lost. The frustrating part is that the agent does not know it was lost. It asks the question with the same calm confidence it had when it was fully informed. There is no "I seem to have lost some context here." There is just a question that implies the last 40 messages did not happen.
This is not a bug in the traditional sense. It is a structural property of how large context windows work, and understanding it is the first step toward being less furious about it.
How the Context Window Actually Works
Every conversation with an AI agent lives inside a context window. Think of it as a rolling sheet of paper. Everything gets written on it: your messages, the agent's responses, tool calls, file contents, command outputs. The model reads the entire sheet every time it responds.
The sheet has a size limit. When it fills up, something has to give.
Most tools handle this through context compaction: an automated summarization step that compresses older parts of the conversation into a shorter summary, freeing up space for new content. The model doing the summarizing is doing its best -- but "its best" is still lossy compression. Details that seem minor get dropped. Facts that were demonstrated through actions rather than stated explicitly get abstracted away.
"We just ran gh pr create and it worked" is an action. The summarizer sees: git operations completed successfully. The specific detail that this is GitHub, that gh is the right CLI, that the SSH key is ~/.ssh/id_ed25519_personal -- those are specifics. And specifics are exactly what gets compressed.
So your agent, now running on a summary of what happened rather than the full record, asks you which version control platform you use.
The Intended Solution: CLAUDE.md
Claude Code has a proper mechanism for this. It is called CLAUDE.md, and it operates at three levels:
Global (~/.claude/CLAUDE.md) -- loaded for every project on your machine. Good for your identity, your preferred tools, your universal workflow preferences. Things that are true about you regardless of which repo you are in.
Project (<project-root>/CLAUDE.md) -- committed to the repository, loaded at the start of every session for that project. The right place for stack details, CI/CD workflow, architecture decisions, credentials locations, and anything else the agent needs to work in this specific codebase without you explaining it from scratch every time.
Session -- just the current conversation. Not persisted anywhere. Gone when the window closes.
The idea is sound. You write down the important facts once, and the agent reads them at the start of every session. No re-explaining. No re-orienting. The agent opens the project and already knows the platform is GitHub, the deploy command is npx vercel --prod, and the SSH key is the personal one, not the work one.
The Problem With the Intended Solution
Here is the part that nobody puts on the packaging.
CLAUDE.md is loaded once, at the start of the conversation. It is injected into the context window like any other text. It does not get re-read on every message. It does not get special protection from compaction. It sits in the context window with everything else, and when the summarizer runs, it applies the same lossy compression to your carefully written instructions that it applies to tool outputs from three hours ago.
So your CLAUDE.md says: "Platform: GitHub. CLI: gh. Remote: git@github.com:yourname/yourrepo.git."
The agent reads it at message 1. By message 40, after several compaction cycles, the summarizer has decided that what matters is "the agent has access to git and various CLI tools" -- and the specific details have been promoted to comfortable abstractions.
You and the summarizer have a disagreement about what was worth keeping. You lose, because you are not the one doing the summarizing.
The gap between "I wrote it in the context file" and "the agent currently knows it" is real, and it widens as the session gets longer.
What Actually Helps
A few things reduce the problem, even if they do not eliminate it:
State important facts explicitly. Do not just demonstrate things -- declare them. "We pushed to GitHub" is an action. "This repository is on GitHub and we use gh CLI for all GitHub operations" is a statement. Statements survive compression better than observations of actions. If something matters, write it as a fact, not just do it.
Re-read on demand. The agent has a Read tool. If it seems to be drifting mid-session, telling it to re-read the context file is a valid instruction and it works. The file gets pulled fresh and the agent re-orients. This is the escape hatch, and it is worth knowing about.
Keep sessions focused. The longer the session, the more compaction cycles, the more drift. Four focused thirty-minute sessions will give you more reliable behavior than one two-hour marathon, even if the total work is identical.
Put the most critical facts in the context file explicitly. Not as prose, but as a direct table or list. "Platform: GitHub. Not Bitbucket. Not GitLab. GitHub." The more specific and unambiguous, the better chance it survives summarization in a useful form.
None of this is a full solution. It is damage reduction.
The Part Nobody Has Solved Yet
Here is what I think is actually the interesting problem -- and where I suspect the next wave of tooling will come from.
The current architecture treats memory as a static artifact: you write a file, it gets loaded at session start, it lives in the context until compaction eats it. The agent either knows something or it does not, and there is no mechanism for it to dynamically retrieve relevant facts as the conversation progresses.
What would actually solve this is on-the-fly contextual memory injection: a system that watches the conversation in real-time, detects when the agent is reasoning about something that has a relevant memory or fact associated with it, and injects that memory directly into the context at the moment it is needed -- not just at session start.
Think of it like RAG, but for agent state rather than documents. Instead of stuffing everything into the context window upfront and hoping compaction spares the important parts, you retrieve relevant facts at the point they are actually needed. The agent starts working with git? The system injects the platform and authentication details. The agent asks about deployment? The deployment instructions appear in context, fresh, right then.
This is not a solved problem. The major labs -- Anthropic, OpenAI, Microsoft -- are all circling it, and some memory layers like mem0 are starting to take runs at it. But nothing has nailed the real-time relevance detection piece cleanly in the context of coding agents.
I have been thinking about building a prototype of this for my own workflow. The idea would be a lightweight middleware layer that sits between the agent and the conversation, monitors what topics are being discussed, and injects relevant context from a structured knowledge store on demand. A personal memory oracle. Nothing clever -- even simple keyword matching would be better than the current all-or-nothing session injection. For a personal workflow with a small, well-structured knowledge store, you could probably get 80% of the value with surprisingly little code.
If you are building tooling in this space: this is the problem worth solving. Not better summarization. Not larger context windows. Dynamic, relevance-triggered memory injection that keeps the agent grounded as the session progresses. The agent is not the weak link here. The memory architecture is.
The Practical Upshot
Until someone builds that: keep your sessions short, write your facts explicitly, and when your agent asks you which version control platform you use after successfully pushing to it three times in the last hour -- take a breath.
It is not the agent going rogue. It is the summarizer making a call you disagree with.
And if you are building AI agents for production use -- not just using them as coding assistants -- this problem compounds fast. An agent that loses context about what system it is connected to, what operations it has already performed, or what constraints it is operating under is not something you can trust at scale. The memory architecture is not an implementation detail. It is a correctness property.
The agent knew where it was. It just lost the note that told it.