Compaction
Agent conversations accumulate context quickly. A long debugging session with verbose tool results can easily exceed 400K characters in a single sitting. Left unchecked, this would overflow the model’s context window and degrade response quality long before that. AZUREAL’s compaction system addresses this by automatically summarizing older conversation history while preserving recent exchanges verbatim.
Character Threshold
A live character counter – chars_since_compaction – tracks the total
characters accumulated in the current session since the last compaction (or
since session start, if no compaction has occurred). The threshold is
400,000 characters, which corresponds to roughly 100K tokens.
This counter updates in real-time during streaming. It feeds the context meter displayed on the session pane border (see Context Meter below).
What Happens When the Threshold Is Crossed
When chars_since_compaction exceeds 400K characters mid-turn, the following
sequence triggers:
1. Partial Turn Storage
The current turn’s events are stored to SQLite immediately, even though the agent has not finished. This ensures no data is lost in the next steps.
2. Flag Set
The auto_continue_after_compaction flag is set. This tells the system to
automatically resume the conversation after compaction completes, without
requiring user intervention.
3. Active Process Killed
The running agent process is terminated immediately. This prevents it from piling more content onto an already-overflowing context window. The partial response is preserved via step 1.
4. Compaction Agent Spawned
A background compaction agent is spawned using the currently selected model. This agent receives the conversation history up to the compaction boundary (see Boundary Selection below) and produces a 2,000–4,000 character summary that captures the key decisions, file changes, and current state of the work.
5. Auto-Continue
Once the compaction agent finishes, AZUREAL automatically sends a hidden “Continue.” prompt. This prompt:
- Uses the new compacted context (summary + preserved recent exchanges).
- Does not create a user bubble in the session pane – it is invisible to the user.
- Resumes the agent’s work seamlessly from where it left off.
From the user’s perspective, there may be a brief pause while compaction runs, but the conversation continues without any manual intervention.
Boundary Selection
Compaction does not summarize the entire conversation. It preserves recent exchanges verbatim to maintain conversational coherence.
spawn_compaction_agent() tries compaction_boundary(session_id, from_seq, keep)
with progressively smaller keep values (3 → 2 → 1). keep=3 is ideal —
it preserves the last 3 user prompts along with all interleaved agent
responses, tool calls, and tool results. However, sessions that cross the
threshold with ≤3 user messages would never find a boundary at keep=3,
leaving compaction stuck. Falling back to keep=2 then keep=1 ensures
compaction can always run as long as at least one user message boundary exists.
Everything before the boundary is summarized by the compaction agent. Everything after it is kept verbatim and included in the next context injection as-is.
This means the agent always sees:
- The compaction summary (covering all older history).
- The last 1–3 user-agent exchanges in full detail (depending on how many exist since the last compaction).
- The new prompt.
Guard Rails
Several mechanisms prevent compaction from misbehaving:
Double-Compaction Prevention
A guard prevents a second compaction from being triggered while one is already in flight. If the threshold is crossed again during the compaction agent’s own execution, the system waits for the current compaction to finish before evaluating whether another is needed.
Deferred Spawn
If compaction_boundary() cannot find enough user messages to establish a
boundary (e.g., the session has fewer than 3 user prompts),
compaction_spawn_deferred is set. This suppresses compaction retries until a
new user message arrives, at which point the boundary calculation is
re-attempted.
Cross-Backend Fallback
If the compaction agent fails on the primary backend (e.g., Claude returns an error), the system retries with the alternate backend (e.g., Codex). This ensures compaction is not blocked by a single backend’s transient failure.
Empty Output Retry
If the compaction agent returns an empty summary, compaction_retry_needed is
set, triggering a re-spawn of the compaction agent. An empty summary would
leave the context without any historical record, so this case is always
retried.
Context Meter
The session pane border displays a color-coded percentage badge showing how close the current session is to the compaction threshold:
| Range | Color | Meaning |
|---|---|---|
| 0–59% | Green | Plenty of headroom |
| 60–79% | Yellow | Approaching threshold |
| 80–100% | Red | Compaction imminent or in progress |
The percentage is calculated as:
chars_since_compaction / 400,000 * 100
The meter updates in real-time during streaming, giving continuous visibility into context consumption.
Inactivity Watcher
When the meter reaches 90% or higher and no new events arrive for 20 seconds, a yellow banner appears in the session pane:
Session may be compacting…
This alerts the user that the pause they are experiencing is likely due to an active compaction cycle, not a stalled agent. The banner disappears when events resume.