Chapter 9: Context, History, and Compaction
Why Context Management Exists
The model does not see the whole repository or the whole past conversation. It sees the prompt assembled for the current request. Context management decides what survives into that prompt.
The job has four parts:
- Preserve enough history for the model to reason.
- Keep tool-call and tool-result pairs valid.
- Summarize or remove old content before the context window is exceeded.
- Avoid sending irrelevant output that distracts the model.
The Basic History Problem
Tool use creates structured history, not just chat messages:
history = [
{"role": "user", "content": "Fix the tests"},
{"role": "assistant", "tool_call": {"id": "1", "name": "shell"}},
{"role": "tool", "tool_call_id": "1", "content": "test failure..."},
{"role": "assistant", "content": "I found the failing module."},
]
If compaction removes the assistant tool call but keeps the tool result, the next model request may be invalid or confusing. Good context managers preserve these relationships.
Codex: Normalized Prompt History
Codex keeps a structured history of response items. Before a model request, the history is converted into prompt-ready items. That conversion can remove unsupported items, normalize tool-call pairs, filter images depending on model capabilities, and include compacted summaries.
Codex Prompt Preparation
def codex_history_for_prompt(history, model_capabilities):
items = []
for item in history.items:
if item.is_internal_event:
continue
if item.is_image and not model_capabilities.supports_images:
continue
items.append(normalize_item(item))
items = repair_tool_call_pairs(items)
items = apply_compaction_markers(items)
return items
Codex also tracks token estimates and can compact before sampling. The important design is that prompt preparation is a controlled projection of session history, not a blind dump.
Claw: Session Messages And Heuristic Compaction
Claw stores session messages with roles and content blocks such as text, thinking, tool use, and tool result. Sessions are persisted as JSONL, which makes resume and audit straightforward.
When input token pressure crosses the configured threshold, Claw runs local compaction. The compactor preserves recent messages, avoids splitting tool pairs, and inserts a continuation summary.
Claw Compaction Shape
def compact_claw_session(session, keep_recent=12):
messages = session.messages
recent = take_recent_without_splitting_tool_pairs(messages, keep_recent)
old = messages_before(recent)
summary = summarize_old_messages_heuristically(old)
session.messages = [
system_message("Earlier conversation summary:\n" + summary),
*recent,
]
session.compaction_count += 1
This is different from relying entirely on a remote model summarizer. The visible Claw implementation favors an in-process, deterministic compaction path that keeps the loop moving.
Token Budgeting
A context manager needs rough accounting even when exact tokenization is provider-specific.
def estimate_prompt_budget(base_prompt, tools, history, model_limit):
fixed = estimate_tokens(base_prompt) + estimate_tokens(tools.schemas())
variable = estimate_tokens(history)
remaining = model_limit - fixed - variable
return {
"fixed": fixed,
"history": variable,
"remaining": remaining,
"needs_compaction": remaining < 8_000,
}
Token accounting does not need to be perfect to be useful. It needs to be conservative enough to trigger compaction before the provider rejects the request.
What To Keep
When space is tight, not all context has equal value.
Keep:
- The user's current task.
- Recent tool results.
- Current plan or todo list.
- Important file paths and decisions.
- Error messages that still need fixing.
- Project instructions and safety constraints.
Summarize or drop:
- Large command outputs after extracting the useful lines.
- Old reasoning that no longer affects the task.
- Repeated search results.
- Stale plans.
- Earlier failed attempts once the final strategy is clear.
def score_message_for_retention(message):
score = 0
if message.is_recent:
score += 5
if message.contains_current_error:
score += 4
if message.contains_file_path:
score += 2
if message.is_large_raw_output:
score -= 3
return score
Tool Output Compaction
Large tool outputs are a common source of context pressure. A good agent summarizes them as soon as possible.
def summarize_tool_output(output):
if output.kind == "test_failure":
return extract_failing_tests_and_stack_traces(output.text)
if output.kind == "search_results":
return keep_top_matches_by_relevance(output.matches)
if output.kind == "build_log":
return extract_errors_and_warnings(output.text)
return truncate_with_notice(output.text)
This is not only token optimization. It changes how well the model can reason. A short, relevant observation beats a huge raw log.
Session Persistence and Resume
Long-running agents need durable state. A user may close the terminal, restart the app, fork a previous conversation, or inspect what happened after the fact. That requires more than a transcript. The stored session should preserve history, workspace identity, model/config snapshots, compaction metadata, and enough tool state to continue safely.
def persist_session(session):
record = {
"session_id": session.id,
"workspace": str(session.workspace),
"config_snapshot": session.config_snapshot,
"messages": session.history.items,
"usage": session.usage.to_dict(),
"compactions": session.compaction_metadata,
}
append_jsonl(session.store_path, record)
Resume is the inverse operation, but it should validate and normalize the stored history before using it again:
def resume_session(session_id):
stored = read_session_store(session_id)
history = repair_tool_call_pairs(stored["messages"])
config = restore_config_snapshot(stored["config_snapshot"])
return RuntimeSession(
id=session_id,
workspace=stored["workspace"],
history=history,
config=config,
usage=stored.get("usage", {}),
)
Codex has a thread/session model that supports resume, fork, app-server state, and multi-agent relationships. Claw persists sessions as JSONL and supports resume-oriented CLI flows. The shared design lesson is that persisted history must remain prompt-valid after reload.
Output Normalization Before Storage
The session store should not be a dump of arbitrary process output. Tool results need stable, bounded representations.
def store_tool_result(history, call, result):
normalized = normalize_tool_result(result)
history.append({
"role": "tool",
"tool_call_id": call.id,
"content": normalized.model_content,
"metadata": normalized.metadata,
})
This makes compaction easier later because the history has consistent shapes.
Compaction Must Preserve Causality
The model should still understand why the agent is in its current state after compaction.
def make_continuation_summary(old_history):
return {
"user_goal": extract_user_goal(old_history),
"files_inspected": extract_files(old_history),
"changes_made": extract_edits(old_history),
"tests_run": extract_test_results(old_history),
"open_issues": extract_unresolved_failures(old_history),
}
The summary should not be a literary recap. It should be a working state snapshot.
Comparison
| Aspect | Codex | Claw |
|---|---|---|
| History unit | Structured response items | Session messages with content blocks |
| Prompt projection | Normalize history for model capabilities | Use session messages with prompt builder/runtime rules |
| Compaction timing | Pre-sampling and token-pressure driven | Threshold-driven auto-compaction |
| Tool pair handling | Normalization preserves valid pairs | Compactor avoids splitting pairs |
| Persistence | Thread/session rollout and state storage | JSONL session files |
| Summary style | Can use richer compaction flows | Local heuristic continuation summary |
Practical Lessons
- Store richer history than you send to the model.
- Convert history to prompt items deliberately.
- Preserve tool-call and tool-result pairs.
- Summarize tool output early.
- Make compaction summaries operational, not narrative.
- Track token pressure before the provider rejects the request.
Source Anchors
For Codex, useful filenames are history.rs, turn.rs, and the compaction
helpers. For Claw, useful filenames are session.rs, compact.rs, and
conversation.rs.