multi-turn inc.
INDEX

index

The Invisible 98.8%

I said one sentence to an AI. My words were 1.2% of the total input. Opening the context, filling it, and folding it led to the question of memory.

1 versus 79

I spoke to an AI.

"How about building an agent that captures a MacBook screen and talks about it?"

A short sentence. But when it reached the AI, it did not travel alone. I opened the actual input the AI received.

My words were about 350 tokens. The full input was about 28,000 tokens. Ratio: 1 to 79.

My words were 1.2% of the total.

What was the other 98.8%? A 6,000-token system prompt. An 8,000-token tool specification. 2,500 tokens of auto-injected skill instructions. Plugin lists, dates, hook results. Things I did not write, see, or know about.

I said one sentence. The AI received a short novel's worth of instructions alongside it.

Opening

The asymmetry was uncomfortable. I thought I was talking to an AI. In reality, I was sending a small signal on top of a massive system.

Humans cannot look inside their own heads. But the inside of an AI's head is data. It can be opened.

I started building a visualization tool. First I tried color coding. System in red, tools in yellow, user in white. Pretty, but flat. Color is packaging that hides information.

I dropped color. Only typography remained. Size and brightness. My words are large and bright. The system prompt is small and dim. No labels, no icons.

Then, for the first time, I could see.

Move the slider below. This is the actual context the AI received while I was writing this essay.

The top of the screen is a wall of tiny, barely visible text. English, dense and mechanical. Past that wall, suddenly, large clear Korean appears. My words. A small island in an ocean of machine text.

What I felt was not discomfort but awe. This tiny signal was steering that enormous machine.

Filling

Here the question changed.

People usually say to save tokens. I started thinking the opposite.

Suppose the AI's maximum context is 200,000 tokens. The current usage is 28,000. That is 14%. The remaining 86% is empty.

An empty context is an empty mind.

The analogy is imperfect. The AI's knowledge already lives in its model weights. But insofar as the context determines behavior at this moment, emptiness is close to drifting without direction.

The question is not "how do I save?" but "how do I fill better?" It is the problem of assembling the best team on a fixed budget.

I ran an experiment. I inserted the same instruction three times. The result changed. What the model had ignored with one copy, it followed precisely with three.

An AI distributes attention across its input. If the same content appears in three places, attention goes to three places. Three times the presence. Without changing the model, changing only the input changes the behavior.

Here I make a leap. I am not certain.

Perhaps long-term memory is a function that decides how often to place a piece of information in the context. Important things appear frequently; less important things get pushed out. This cannot be all of memory. But frequency, as one axis, works. A 2025 study from Google Research shows this.1 Prompt repetition improved performance on 47 of 70 tests, with zero degradation.

Filling. The first half of memory.

Folding

Filling has a limit. What happens when 200k is full?

You have to choose. What stays and what goes.

Here I encountered an unexpected problem.

"Go ahead."

I said this to the AI. During a 29-day session, I said it multiple times, and each time it meant something different.

At one point it meant "implement now." The design was done; only code remained.

At another point it meant the opposite. Right before, I had said "this is bad" and "roll it back." That "go ahead" was not about implementation. It was about review.

The current input is the same. The correct action is different.

As a conversation grows longer, you cannot carry everything. You must summarize and compress. But what do you discard? "Find past moments similar to the current question." Most memory systems work this way. Retrieval. But retrieval cannot distinguish the two cases of "go ahead."

What is needed is not retrieval. It is preserving the boundary where the same input demands a different action.

Boundary

Two histories exist. The current input is the same. The correct action differs. If a memory system merges the two histories into one, the action breaks. Call this history aliasing.

HistoryCurrent inputCorrect action
Kept questioning the approach"Go ahead"Review first
Approved the design and specified units"Go ahead"Implement

There is also the opposite failure. "Be more rigorous," "cut the nonsense," and "is this wrong?" look different on the surface but demand the same action. Splitting them into separate states makes memory sensitive but not useful.

Good memory does two things at once. It separates histories that must lead to different actions, and it merges histories that may lead to the same action.

Compression

Memory is a compression function.

You cannot carry every past, so you reduce. Whether the result is a retrieval set, a summary, or a state variable does not matter. What matters is which histories the representation merges and which it separates.

Good compression is not about making things small. It is about keeping the differences that change action and discarding the rest.

This sounds obvious, yet most current memory systems are not evaluated by this criterion. They measure how much is remembered, not which boundaries are preserved.2

Here, filling and folding meet. What to put in when filling the context, what to keep when the context is full. Both are the same question: does this change a future action?

Filling ends where folding begins, and the criterion of folding determines the direction of filling.

Trajectory

One thing keeps bothering me.

When finding similar contexts, most systems convert text into vectors and measure distance. That is a point. A snapshot of a moment.

But the essence of context is sequence. "It's okay" after comfort is not the same as "it's okay" after a fight. "Go ahead" was the same way. Its meaning depended on what had come before.

If the point is the same but the trajectory is different, it is different.

To preserve boundaries, you must compare trajectories, not points. This remains an open problem.3

Self-reference

The session that inspired this essay lasted 29 days. What started as a screen capture agent became a visualization tool, passed through filling and folding, and arrived at the question of memory.

I ended up somewhere that was in no plan.

One thing I realized: the act of looking at the context consumes context. When you say "show me your context," that sentence is added to the context. Observation changes the thing observed.

When talking to an AI, what you see is 1.2%. The other 98.8% is invisible. The invisible part decides more than the visible part. When it fills up, you must decide what to keep and what to fold.

That decision is memory.

Is this only about AI? When we talk to each other, the other person sees only a fraction of our words. Why we said them, what experiences shaped them — the other person cannot know. We are all speaking on top of an invisible 98.8%.

The difference is that the AI's 98.8% can be opened. Ours cannot. Not yet.

Footnotes

  1. Leviathan, Y., Kalman, M., & Matias, Y. (2025). Prompt Repetition Improves Non-Reasoning LLMs. arXiv:2512.14982. Together with Stanford's Lost in the Middle (Liu et al., 2023, arXiv:2307.03172), this shows that the position and frequency of information within the context shape model behavior.

  2. MemGPT (Packer et al., 2023), MemOS (MemTensor, 2025), LongMemEval (Wu et al., 2024), MemoryAgentBench (Hu et al., 2025), MemoryArena (Zhang et al., 2026). Current memory benchmarks mostly measure retrieval accuracy. No evaluation framework yet measures whether history boundaries are preserved.

  3. State compression under partial observability — POMDP belief states, predictive state representations (Littman, Sutton, Singh, 2001), bisimulation metrics (Ferns et al., 2004) — has addressed this problem for decades. The gap is that this perspective has not been reflected as an explicit evaluation criterion in LLM agent memory.

The Invisible 98.8% | Multi-turn Inc.