Debugging Behavior Trees

Every behavior-tree run in AgentLoop is recorded tick by tick, which turns debugging into something closer to a flight recorder than a print statement. You can rewind a run, inspect any node’s state and the data that drove it at any moment, set breakpoints, replay a finished run from a saved fixture, and fork from any point to test a fix — all from the desktop app.

The debugger lives in the Agents view: drill into an agent, open its Behavior Tree tab, and pick a running or recorded run. The same surface opens for sandbox runs you launch yourself.

The debugger surface

Opening a run gives you one integrated view:

Element	What it shows
Tree graph	The behavior tree rendered top to bottom, each node colored by state (`READY` / `RUNNING` / `SUCCEEDED` / `FAILED`). The running node pulses; the active path’s edges light up.
Tick scrubber	A transport bar at the bottom for stepping through recorded history (see below).
Node inspector	Click any node for its type and state — and, for LLM nodes, the exact prompt used and the decision history (decision, reasoning, confidence) for each call.
Blackboard inspector	Toggle with `B` to see the shared state at the current tick; keys that changed flash.
Execution log	Toggle with `L` for a per-tick, ordered feed of node state transitions — the clearest way to confirm exactly which nodes ticked and in what order.
Search	`Cmd/Ctrl+F` filters the tree by node name and state, with next/previous navigation.

Time-travel debugging

Because every tick is a full snapshot — node states, the active path, the blackboard, the log, and any LLM decisions — you can move freely through a run’s timeline.

Scrubbing through ticks

The scrubber gives you transport controls — go to start, previous, next, go to end, and a draggable slider — plus a counter (“tick 3 of 60”) and the name of the node entered at that tick. As you scrub, the entire view snaps to that moment in time: the graph re-colors to each node’s state at that tick, the blackboard shows its values as they were, and the log trims to only what had happened so far.

For a live run the scrubber auto-follows the newest tick with a green Live badge. Grab the handle or step back and it pins to that historical tick; a jump-to-latest button returns you to the live tail. For a completed run it opens on the final tick, so you see the end state first, then rewind.

Pausing and stepping

For runs executing in-process (a sandbox run or a daemon-hosted continuous agent), a transport bar lets you halt and advance execution itself:

Control	Shortcut	Effect
Pause	—	Halt the tree at the next node boundary
Step	`F10`	Advance exactly one node
Resume	`F5`	Continue running
Stop	`Shift+F5`	End the run

A “Paused at: <node>” label shows where execution is held.

Live pause and step work only for in-process runs. A run executing in a separate worker process can be inspected and scrubbed, but not paused live — the UI tells you when that’s the case.

Steering from chat

You don’t need the desktop UI to time-travel. Chat reads the same recorded ticks, so you can ask “what’s the engineer doing on task 12?”, “show me the blackboard at the failure tick”, or “pause it and step to the next node” and get answers and actions backed by the identical data the visual scrubber uses.

Fixtures

A fixture is a saved run you can replay deterministically — the foundation for reproducing bugs and regression-testing tree changes.

Recorded-run fixtures. Any run can be captured as a self-contained bundle of all its ticks plus metadata (agent type, tree, task, final state, timestamps) and written to a portable file. Save it, share it, and reopen it later to scrub through the run without re-executing the agent. Loading one puts the debugger into replay mode — the scrubber, blackboard, log, and node inspector all drive off the fixture.
Scripted fixtures. A fixture can also define an ordered sequence of canned agent responses (including simulated errors and timeouts), so a tree runs against fixed inputs with no model or network variability. This is how you make a run fully repeatable for testing.

When replaying a fixture, the transport bar adds breakpoints: add a node-name breakpoint to halt whenever a named node executes, or a blackboard-key breakpoint to halt whenever a watched key changes.

Fixtures plug straight into the run launcher. When you start a new run, the From fixture preset seeds the starting blackboard from a saved run — see Authoring Behavior Trees.

Forking

Forking lets you ask “what if?” from any point in a run without disturbing the original.

Pick a tick and fork

Scrub to the tick you want to branch from and click Fork. (The pill is disabled while following the live tail — pause on a specific tick first — and on ticks recorded without a fork snapshot, with a tooltip explaining why.)

Edit what the fork does differently

The fork dialog opens on the chosen tick with tabs for your changes:

Blackboard — a per-key editor with a parent → fork diff: override values, add keys, or remove them (for example, set an error count to simulate a different failure mode).
LLM mock — provide canned answers for specific LLM nodes, for fully reproducible runs.
Tool stub — provide canned results for specific tool calls.
Tree config — optionally swap the tree topology for the fork (advanced).

Run the fork and compare

Submit, then Run the fork — it spawns a worker that executes it, and its ticks fill in live on the same canvas. A lineage sidebar appears above the tree showing the ancestry as a rooted tree; click any row to switch the canvas to that run. You can fork a fork and build a whole tree of experiments. The original run is never modified.

A run hits a validation failure at tick 5. Fork from tick 4, clear the applied-changes key, and run. If validation now passes, the bug is in how changes were applied — not in the validation step.

Restart from a node

For a quicker “re-run from here,” the node inspector offers Restart from this node — a fresh run that begins at the selected node (everything before it auto-succeeds), optionally merging blackboard edits. Use it to skip past expensive early steps and focus on the part that failed; use full forking when you want to branch and keep both paths.

Maturity. Tick recording, scrubbing, recorded-run replay, the node inspector, live pause/step/resume/stop, fork-from-tick with the fork dialog and lineage sidebar, and restart-from-node are all available in the desktop app today. Two limits worth knowing: live pause/step is in-process only, and forking requires a tick that was recorded with a fork snapshot (older recordings prompt you to re-record on that branch).

Authoring Behavior Trees Behavior Trees