Agentic Task Execution: Deep Dive
This deep dive expands the Agent Loop overview. It focuses on how an agent actually moves from a user request to model events, tool calls, tool results, retries, and final output.
The implementation details here use Codex and Claw as the reference sources. Claw is the local Claude-Code-like Rust implementation in this repository, not Anthropic's closed-source codebase.
The ReAct Pattern In Runtime Form
The classic pattern is ReAct: reason, act, observe, repeat.
def react_loop(task):
observation = {"user_task": task}
while True:
thought_and_action = model(observation)
if thought_and_action.final_answer:
return thought_and_action.final_answer
result = execute_tool(thought_and_action.tool_call)
observation = {
"previous": observation,
"tool_result": result,
}
Real agent runtimes add several constraints:
- Model output arrives as a stream.
- Tool calls may arrive before the stream is finished.
- Some tools can run in parallel; mutating tools must serialize.
- Permission checks can pause or deny execution.
- Context can exceed the model window.
- The user can cancel.
- The UI needs progress events before the final answer.
Codex Execution Trace
Codex uses a turn-oriented runtime. A turn is not just one model call; it can include a model stream, several tool calls, tool result recording, compaction, and a decision to sample again.
1. Prepare The Turn
async def prepare_codex_turn(session, user_input):
turn = TurnContext(
model=session.config.model,
cwd=session.cwd,
sandbox_policy=session.config.sandbox,
approval_policy=session.config.approval,
cancellation=session.new_cancellation_token(),
)
await session.record_user_input(user_input)
await maybe_run_pre_sampling_compaction(session, turn)
return turn
The turn context freezes important runtime settings so the model call and tool calls share the same view of policy.
2. Build Prompt Input
Codex does not send raw stored history directly. It asks the history manager for prompt-ready items.
def build_prompt_input(session, turn):
history = session.history.clone()
prompt_items = history.for_prompt(
model=turn.model,
supported_modalities=turn.model.capabilities,
)
return {
"instructions": session.base_instructions,
"input": prompt_items,
"tools": turn.tool_router.model_visible_tools(),
}
This is where unsupported items can be removed, images can be filtered, and tool-call/tool-result pairs can be normalized.
3. Stream The Model Response
During sampling, Codex handles text deltas, reasoning deltas, completed output items, token usage, and final completion events.
async def try_run_sampling_request(client_session, request, runtime):
in_flight_tools = []
needs_follow_up = False
async for event in client_session.stream(request):
if event.kind == "text_delta":
runtime.emit_text_delta(event.text)
elif event.kind == "reasoning_delta":
runtime.emit_reasoning_delta(event.text)
elif event.kind == "output_item_done":
tool_call = runtime.router.build_tool_call(event.item)
if tool_call:
future = runtime.handle_tool_call(tool_call)
in_flight_tools.append(future)
needs_follow_up = True
else:
runtime.record_assistant_item(event.item)
elif event.kind == "completed":
runtime.record_usage(event.usage)
break
await drain_in_flight(in_flight_tools)
return SamplingResult(needs_follow_up=needs_follow_up)
The key detail is that a completed output item can start a tool future while the model stream is still active. The turn still waits for all tool futures before it finishes.
4. Control Parallel Tool Execution
Codex uses a runtime gate so parallel-safe tools can overlap and mutating tools run exclusively.
class ToolParallelGate:
def __init__(self):
self.lock = ReaderWriterLock()
async def run(self, tool_call, handler):
if handler.supports_parallel(tool_call):
async with self.lock.read():
return await handler.run(tool_call)
async with self.lock.write():
return await handler.run(tool_call)
This lets file reads, image views, or other safe operations overlap while shell commands and patch operations avoid stepping on each other.
5. Route Execution Through The Orchestrator
For execution-sensitive tools, Codex routes through an orchestrator:
async def execute_codex_tool(tool, request, turn):
approval = compute_approval_requirement(request, turn.policy)
if approval.requires_user:
decision = await turn.session.ask_for_approval(request)
if not decision.allowed:
return denied(decision.reason)
sandbox = select_sandbox(turn.sandbox_policy, request)
result = await tool.run(request, sandbox=sandbox)
if result.sandbox_denied and turn.policy.can_retry_without_sandbox:
retry = await turn.session.ask_for_approval("retry unsandboxed")
if retry.allowed:
return await tool.run(request, sandbox=None)
return result
The orchestrator is where permission policy, sandbox policy, network approval, hooks, and retry behavior converge.
6. Decide Whether To Continue
async def finish_codex_sampling(session, result):
if result.needs_follow_up:
return "continue"
if result.hit_token_limit:
await compact_history(session)
return "continue"
stop_hook = await run_stop_hooks(session)
if stop_hook.injected_message:
session.record_user_input(stop_hook.message)
return "continue"
return "stop"
The final answer is only one possible exit. Tool calls, compaction, and hooks can all send the loop back for another model sample.
Claw Execution Trace
Claw's ConversationRuntime has a more direct loop. It sends the session
messages to the provider, receives assistant events, builds an assistant message,
executes tool uses, appends results, and repeats.
1. Build Runtime Dependencies
def build_claw_runtime(config, workspace):
session = Session.new(workspace)
prompt = SystemPromptBuilder(config, workspace).render()
api_client = ProviderClient.from_config(config.model)
tools = GlobalToolRegistry.from_config(config)
policy = PermissionPolicy.from_config(config.permission_mode)
return ConversationRuntime(
session=session,
system_prompt=prompt,
api_client=api_client,
tools=tools,
permission_policy=policy,
)
The runtime owns the main moving parts, which makes the control flow easy to trace from a single object.
2. Run One Conversation Turn
async def run_claw_turn(runtime, user_text):
runtime.session.push_user_text(user_text)
while runtime.iteration_count < runtime.max_iterations:
request = {
"system_prompt": runtime.system_prompt,
"messages": runtime.session.messages,
"tools": runtime.tools.specs(),
}
events = await runtime.api_client.stream(request)
assistant = build_assistant_message(events)
runtime.session.push_assistant(assistant)
tool_uses = assistant.tool_uses()
if not tool_uses:
break
for tool_use in tool_uses:
result = await execute_claw_tool_use(runtime, tool_use)
runtime.session.push_tool_result(tool_use.id, result)
if runtime.should_auto_compact():
runtime.session.compact()
Unlike Codex, the visible Claw runtime does not center on in-flight futures inside the model stream. It handles tool uses after assistant events are collected into a message.
3. Authorize And Execute A Tool
async def execute_claw_tool_use(runtime, tool_use):
await runtime.hooks.pre_tool(tool_use)
required_mode = classify_required_mode(tool_use)
decision = await runtime.permission_policy.authorize(
tool_use,
required_mode=required_mode,
)
if decision.allowed:
result = await runtime.tools.execute(tool_use)
else:
result = denied_tool_result(decision.reason)
await runtime.hooks.post_tool(tool_use, result)
return result
Permission denial is returned as a tool result so the model can adjust rather than losing the conversation state.
4. Compact When Needed
def claw_should_compact(runtime):
threshold = runtime.compaction.input_token_threshold
return runtime.usage.cumulative_input_tokens > threshold
def claw_compact(session):
recent = keep_recent_messages_without_breaking_tool_pairs(session.messages)
summary = summarize_older_messages(session.messages, excluding=recent)
session.messages = [system_summary(summary), *recent]
The goal is not perfect summarization. The goal is to preserve enough operational state for the next model call.
Timing Comparison
| Step | Codex | Claw |
|---|---|---|
| User input | Recorded into session/thread history | Pushed into session messages |
| Prompt preparation | History projected into prompt-ready response items | Session messages sent with built system prompt |
| Stream processing | Handles text, reasoning, output items, completion, usage | Provider events normalized into assistant events |
| Tool start time | Tool future can start when output item completes | Tool use runs after assistant message is built |
| Tool parallelism | Parallel-safe tools overlap through a lock gate | Core runtime handles tool uses sequentially |
| Tool result sync | Drain in-flight tools before finishing sampling | Append each result after execution |
| Continue condition | Follow-up flag, compaction, hooks, API state | More tool uses or max iteration/compaction logic |
Cancellation
Cancellation must stop both model streaming and tool execution.
async def cancellable_tool_run(tool, args, cancellation):
tool_task = create_task(tool.run(args))
cancel_task = create_task(cancellation.wait())
done = await wait_first(tool_task, cancel_task)
if done is cancel_task:
tool_task.cancel()
return aborted_tool_result()
return tool_task.result()
Codex uses cancellation tokens throughout the turn and tool path. Claw's simpler runtime can still propagate cancellation through its API client and tool executor interfaces when wired by the caller.
Recovery Patterns
Transport Recovery
async def recover_transport(client, request):
for attempt in range(client.retry_budget):
try:
return await client.stream(request)
except RetryableTransportError:
await sleep(backoff(attempt))
if client.can_switch_transport():
client.switch_transport()
return await client.stream(request)
raise RuntimeError("transport failed")
Codex has explicit stream retry and transport fallback behavior. Claw's provider abstraction hides provider-specific streaming details from the conversation loop.
Context Recovery
async def recover_context(session, error):
if error.kind == "context_too_large":
compact_history(session)
return "retry"
if session.token_pressure_high():
compact_history(session)
return "continue"
return "fail"
Both systems need to compact before the conversation becomes unusable.
Permission Recovery
async def recover_permission_denial(history, tool_call, denial):
history.append_tool_result(tool_call.id, {
"ok": False,
"error": "permission denied",
"reason": denial.reason,
})
return "let_model_choose_alternative"
Denied tools should become observations. The model may choose a read-only alternative, ask the user, or explain the blocker.
Why Tool Pair Integrity Matters
Provider APIs expect tool results to correspond to earlier tool calls. If a compactor drops one side of the pair, the next request can be invalid.
def preserve_tool_pairs(items):
kept = []
pending_tool_ids = set()
for item in items:
if item.is_tool_call:
kept.append(item)
pending_tool_ids.add(item.id)
elif item.is_tool_result:
if item.tool_call_id in pending_tool_ids:
kept.append(item)
pending_tool_ids.remove(item.tool_call_id)
else:
kept.append(item)
return kept
Codex handles this during history normalization. Claw's compaction avoids splitting tool-use/tool-result pairs.
Practical Takeaways
- Treat model streaming, tool execution, and history recording as separate phases even when they overlap.
- Start safe tools early when the runtime can preserve ordering.
- Serialize mutating tools unless the tool contract proves they cannot conflict.
- Make permission denials model-visible.
- Compact before provider errors when possible.
- Keep cancellation paths explicit.
- Preserve tool-call/tool-result pairs through every history transformation.
Source Anchors
For Codex, read turn.rs, stream_events_utils.rs, parallel.rs,
orchestrator.rs, and history.rs. For Claw, read conversation.rs,
client.rs, tools/lib.rs, permissions.rs, and compact.rs.