How compaction works
No matter how large an LLM's context window is, it is finite. Compaction is how Buffaly automatically reduces active timeline weight during long-running sessions, preventing context collapse and saving token costs.
The Reality of Compaction
Let's be honest: despite what AI marketing materials claim, no compaction algorithm is perfect. When you aggressively compress context, the model will inevitably drop nuances or temporarily lose the immediate thread of its work.
Buffaly expects this. The system is designed to compact aggressively to save tokens, while relying on architectural fallbacks—like the System 2 Watcher, Semantic Database (SemDB), and Task Artifacts—to cleanly recover state when the primary agent gets confused.
What compaction actually does
A session accumulates user messages, tool calls, logs, and answers. When tokens hit a threshold, Compaction reduces the active context that must be replayed into the LLM on the next turn. It does not delete your data. Everything is permanently backed by SQL Server.
What happens in active context:
- Older middle details are summarized or dropped from the immediate prompt.
- The most recent boundaries and steps are preserved.
What remains completely safe:
- The full SQL Server timeline history.
- Durable Task Artifacts, Plan.md, and Scratch.md.
- Committed codebase and wiki changes.
The 3 Compaction Methods
Buffaly supports three distinct provider methods for compressing context. You configure this via the CompactionEngine setting.
Codex Compaction (Recommended)
Configured as CodexApi. This is generally the best-performing and most reliable compaction engine for long-running enterprise tasks. It intelligently preserves the operational narrative.
OpenAI Responses API
Configured as ResponsesApi. Uses provider-backed native summarization tools. Strong, but subject to provider-side logic updates.
Local Advanced Compactor
Configured as LocalAutomatic or LocalManual. A deterministic, local algorithm that forcefully archives and truncates state without relying on an external LLM call.
Settings and thresholds
| Setting | Meaning |
|---|---|
| MaxConversationTokens | Absolute maximum token budget before forced failure (defaults to 100,000). |
| TriggerTokens | The token count where compaction is automatically triggered. |
| TargetFreeTokens | The desired amount of headroom to clear out during compaction. |
| TargetConversationTokens | The computed target size of the active conversation after compaction finishes. |
| CompactionEngine | Must exactly match the enum: CodexApi, ResponsesApi, LocalAutomatic. |
Recovery: What to do when the agent forgets
Because compaction removes immediate context, agents will sometimes stall or repeat actions after a heavy compaction cycle. Here is the strict escalation path for recovery:
1. Rely on the System 2 Watcher
Ideally, you don't do anything. If the primary agent gets off track post-compaction, the System 2 Watcher (the supervisory agent) will catch the deviation, validate the Plan, and automatically instruct the primary agent to correct its course.
2. Prompt a Semantic History Search
If the agent is truly lost, do not try to re-explain the entire task. Because all history is in SQL and the Semantic Database (SemDB), simply ask the agent to self-retrieve its past context.
"You just underwent compaction. Query your semantic session history to remember exactly how we configured the database connection, then resume the plan."
3. Fall back to Durable Artifacts
If the timeline is severely truncated, instruct the agent to re-anchor itself against the files that survived the wipe.
"Read Plan.md, Scratch.md, and task-01.md to regain your context, then tell me the next safe step."
How to verify compaction ran
Ask the agent for verified evidence rather than a status guess. A good diagnostic prompt looks like this:
- Check lifecycle rows or logs mentioning "compaction start" or "success".
- Look for an archive snapshot path for the pre-compaction epoch.
- Verify that Plan, Scratch, and task artifacts are perfectly intact.