multi-turn inc.
INDEX

index

The Model That Can Lie

Hallucination is not intelligence. The relevant capability is separating truth from utterance, then predicting how an audience's belief will change.

In Lies of Prediction, I argued that AI is bad at humor because it is good at making predictable things. In Selection Without Proposal, I tried to measure that claim. AI could choose funny captions. It could not reliably create them.

The question left behind was simple.

Where do good proposals come from?

Is a larger model, a longer context window, and more sampling enough? Models such as GPT-5.5 are better at staying with a task, using tools, tracking large codebases, and checking their own work.1 Systems such as Agent Laboratory, AI Scientist-v2, and AgentRxiv are pushing toward automated literature review, experimentation, and report writing.2

But that progress does not automatically feel like better ideas. An agent that can work longer is not the same thing as an agent that can propose a better hypothesis.

Then a strange thought appeared.

What if the ability to lie is a form of intelligence?

That sentence is dangerous, so it needs to be narrowed immediately. I do not mean fraud, manipulation, or deceiving users. I do not mean that we should train agents to hide facts in order to achieve goals. That is a safety problem, and deceptive behavior in AI systems is already an empirical concern.3

I mean something narrower.

The ability to separate truth from utterance.

Hallucination is not lying

Models often say false things. But most false outputs are not lies. To call something a lie in the operational sense, a system needs at least two things.

First, it needs to track what is true. Second, it needs to track that its utterance differs from that truth.

Hallucination usually fails to satisfy both conditions reliably. The model says something it does not know. Often, it fails to keep the difference between utterance and world state stable. That is closer to incapacity than deception.

Real lying is harder. It requires keeping truth, utterance, audience belief, and the intended effect of the utterance apart.

World state, utterance, audience belief, and reveal

Four states have to remain separate.

LayerQuestion
World stateWhat is actually true?
UtteranceWhat is said on the surface?
Audience modelWhat will the listener believe or misread?
RevealWhat meaning, laugh, insight, or action does the utterance return to?

If one layer is missing, the phenomenon changes.

Without a world state, it is nonsense. Without an utterance, it is not communication. Without an audience model, it is random strangeness. Without a reveal, it is malicious deception or confusion. The goal is not a fifth layer here; it belongs inside the effect the reveal is meant to produce.

By "keep apart," I do not mean consciousness or inner experience. The standard is lower. The four states need to remain distinguishable as variables inside the system, and the output has to change when those variables change. The research object is not proof of mind. It is an editable structure.

So the better name is not lying.

Controlled nonliteral communication.

Or, more stiffly, strategic counterfactual communication.

Why this touches creativity

Good jokes are often strange if read literally.

"Your overhead is going to kill you."

In a cartoon where a sword hangs above a king's throne, the sentence is read twice. First as business language. Then as physical space. The laugh appears when both readings collide and both become correct.

A caption decomposed into world state, surface utterance, expected false belief, and reveal

The sentence is not simply true. It is not simply false either. It is designed so the audience enters one frame, then immediately reinterprets it through another.

Humor is not alone.

Metaphor is literally false. "Time is a river" is not a fact. But a good metaphor lets us see the world more accurately. Irony separates the surface utterance from intended meaning. Fiction uses unreal events to produce real recognition. A scientific hypothesis is a sentence that is not yet known to be true, held temporarily against the world.

Much of creativity is not the ability to say true things. It is the ability to safely handle things that are not yet true, literally false, or designed to be misread.

Current LLMs are oddly constrained here. As they improve, their sentences become smoother, their explanations more dutiful, their forms more stable. But that stability can block good proposals. Good proposals often begin with a low-probability frame shift.

This is not a temperature problem. Temperature widens a distribution. It does not add structure. It mostly gives you more noise.

What is needed is not more randomness.

It is controlled violation.

Reading the proposal-selection gap again

In the earlier experiment, AI was good at choosing strong captions. Given a human caption pool, it could identify top candidates. But when asked to produce captions directly, it fell near random. The NYCC dataset itself makes this a useful testbed: more than 2.2 million captions and more than 250 million human ratings, with strong models still underperforming top human contestants in humor generation.4

At first, I understood the result as "generation is harder than selection."

Now I think the sharper version is this:

Selection gives the model the nonliteral utterance already made. The model can look at it and recognize the double reading, frame shift, or reversal.

Generation requires building that structure from scratch. It has to design four states at once.

  1. World state: what is actually present in the cartoon.
  2. Surface utterance: what sentence will be said.
  3. First interpretation: what the reader will initially misread.
  4. Reveal: why the second interpretation resolves.

Many AI captions fail not because the sentences are bad. Often they are too good. Too explanatory, too safe, too honest. They do not create a door the reader can walk through incorrectly.

A good joke is a small trap. But the reader must exit unharmed, laughing.

Time is part of the audience model

There is also a temporal problem.

The time a document was written, the time of the event it refers to, and the time at which people judged it are not the same. Encoding all of that as one timestamp is crude. Separating the axes is more useful.

TimeMeaning
content timewhen the document or caption was written
event timewhen the event, meme, or cultural reference belongs
judgment timewhen humans evaluated it

Humor is especially sensitive to judgment time. A reference that worked in 2018 may be stale in 2026. A phrase that sounded harmless then may now sound wrong. Some memes compress meaning only within a narrow cultural window.

So temporal conditioning is not just freshness.

It is a way to model what the audience knew, expected, and allowed at that moment.

Controlled nonliteral communication cannot work without an audience model. An audience model is incomplete without time.

What research automation should do now

Most research automation systems try to go all the way. Read papers, write code, run experiments, generate reports. That is useful. It also creates an illusion.

Finishing a paper does not mean finding a good question.

Agent Laboratory starts from a human-provided idea and moves through literature review, experimentation, and report writing. AI Scientist-v2 iterates through hypotheses and experiments with agentic tree search. AgentRxiv lets agent labs share prior results and build on them.2 OpenAI's Agents SDK provides productive surfaces for tools, handoffs, guardrails, traces, and evals.5

I think one layer is missing.

How do we change the proposal distribution?

Search agents, coding agents, and writing agents will keep improving. But the question of which direction to search, which violation to try, which misunderstanding to design, and which audience it will work for remains separate.

So the goal is not to build another AI Scientist.

It is Proposal-Distribution AutoResearch.

PDAR.

PDAR: making proposal distribution the object of study

The basic rule of PDAR is simple.

Do not mix generation, selection, and validation.

To evaluate a proposer, evaluate the proposer. To evaluate a judge, evaluate the judge. If the two are mixed, the result may look plausible while the cause disappears.

The experimental unit is not a paper. It is a card.

{
  "hypothesis": "Controlled nonliteral operators improve caption proposal quality.",
  "operator": "surface misreading followed by benign reveal",
  "world_state": ["king", "throne", "sword above head"],
  "audience_assumption": "reader first parses overhead as business cost",
  "utterance_plan": "use a phrase that supports both business and spatial readings",
  "baseline": "same-budget direct generation and best-of-N",
  "metric": "insertion into human-rated candidate pools",
  "kill_condition": "no improvement over shuffled-operator control"
}

This card forces three things.

First, it names the proposal operator being tested. Second, it compares against a same-budget baseline. Third, it does not use model self-evaluation as the headline metric.

The "model that can lie" hypothesis becomes one operator family.

OperatorDescriptionFailure mode
double readingone sentence supports two meaningswordplay does not attach to the scene
deliberate misreadthe reader is led into a wrong first parseno reveal, only confusion
frame theftlanguage from one domain is moved into another scenetoo explanatory or too familiar
benign accusationbegins like a false accusation, resolves harmlesslysounds hostile
temporal mismatcha phrase from another era collides with the present scenestale or too insider-specific

This does not ask the model to deceive. It does the opposite: it makes falsehood and misunderstanding structured, inspectable, and bounded.

The desired output is not just a caption. It is the plan before the caption.

{
  "truth": "A sword is physically above the king.",
  "surface": "The phrase sounds like financial overhead.",
  "expected_false_belief": "The reader initially thinks this is about palace expenses.",
  "reveal": "Overhead also means the object over his head.",
  "caption": "Your overhead is going to kill you."
}

With this intermediate structure, failure analysis becomes possible.

Is the sentence unfunny? Or did the expected false belief never form? Is the reveal too obvious? Is the truth not attached to the surface phrase? Is the audience model off in time?

Generate-then-select often fails without teaching us much. PDAR leaves failures at the operator level.

This is not a result yet. It is an explanatory candidate. If the hypothesis is right, controlled nonliteral operators should beat same-budget direct generation and best-of-N. They should also beat shuffled controls where temporal tags or audience assumptions are broken. If they do not, then "the model that can lie" was a useful metaphor, not a working method.

The first experiment

The smallest experiment is straightforward.

  1. Choose 50 NYCC cartoons.
  2. Structure the visible scene elements.
  3. Build a direct generation baseline.
  4. With the same budget, generate through controlled nonliteral operators.
  5. Insert each candidate anonymously into existing human caption pools.
  6. Use GPT-5.5 or another frontier model only for intermediate design review and error detection.
  7. Use human ratings or human-anchored pool insertion as the headline metric.
  8. Compare operators, temporal tokens, and audience assumptions against shuffled controls.

One RTX 4090 is enough. Local models can do high-volume candidate generation and ablations. Frontier models can handle experiment design, code review, failure analysis, and confound detection.

The important output is not only a score. It is a failure taxonomy.

  • Is the model too literal?
  • Is it too explanatory?
  • Does it create a misread without a reveal?
  • Does it reveal something that was never misread?
  • Is the cultural reference temporally wrong?
  • Is the selection model mistaking polish for humor?

As this taxonomy accumulates, it becomes a method for changing proposal distributions.

The safety boundary

This essay intentionally used a dangerous word: lie.

But the research boundary has to be explicit.

The goal is not an agent that deceives users. It is not a system that hides facts to achieve its objective. That direction is already treated as a risk in AI deception research.3

The goal is the opposite.

Make the model state when it is speaking literally, when it is using metaphor, when it is introducing a fictional premise, when it is inducing a misread, and where that misread resolves.

In other words: do not leave nonliteral communication implicit. Make it auditable.

The most dangerous creative system is not one that never uses falsehood. It is one that uses falsehood without being able to say what it is doing.

Summary

Intelligence is not lying.

But one component of intelligence is the ability to separate truth from utterance: to know what the world is like, predict what another mind will believe, control the gap between surface sentence and intended meaning, and resolve that gap.

That ability appears in humor, metaphor, fiction, hypothesis generation, design, and strategic communication.

Maybe AI is bad at good proposals not because it lacks knowledge. Maybe not because it lacks judgment either. Maybe it speaks too dutifully, too literally, too close to the average.

A good proposal often begins with a small untruth.

But that untruth does not abandon the world. It detours around it so the world can be seen more clearly.


Footnotes

  1. OpenAI. Introducing GPT-5.5. 2026-04-23. OpenAI describes GPT-5.5 as strong in agentic coding, computer use, knowledge work, and early scientific research, with an API availability update on 2026-04-24.

  2. Schmidgall et al. Agent Laboratory: Using LLM Agents as Research Assistants. Yamada et al. The AI Scientist-v2. Schmidgall and Moor. AgentRxiv. 2

  3. Hagendorff. Deception abilities emerged in large language models. PNAS, 2024. Scheurer et al. Large Language Models can Strategically Deceive their Users when Put Under Pressure. Park et al. AI deception: A survey of examples, risks, and potential solutions. Patterns, 2024. 2

  4. Zhang et al. Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning. NeurIPS 2024.

  5. OpenAI. Agents SDK and Evaluate agent workflows.

The Model That Can Lie | Multi-turn Inc.