Five Modes of
Execution
The orchestrator selects the optimal execution strategy for every task — from simple routing to complex DAG-based parallel workflows with dependency management.
The Right Mode for Every Task
The task planner analyzes your request and automatically selects the execution mode that maximizes speed without sacrificing quality.
Router
Routes the entire request to exactly one sub-agent. Used when the task maps cleanly to a single domain expert — no decomposition needed.
Sequential
Each task depends on the output of the previous step. The pipeline flows linearly — research feeds analysis, analysis feeds reporting.
Parallel
All tasks are independent and run simultaneously. The orchestrator spawns up to 5 concurrent sub-agents and waits for all to complete.
DAG (Directed Acyclic Graph)
A mix of parallel and sequential — tasks form a dependency graph. Independent branches run concurrently while dependent tasks wait for upstream completion.
depends_on: [t1, t2]
depends_on: [t3]
Iterative
When the full scope is unknown upfront, the orchestrator executes one step at a time and decides the next step based on results. Exploration-driven workflows.
Production-Grade Execution
Fine-grained controls for concurrency, rate limiting, timeouts, and budget management — configurable per deployment.
Concurrency Limiter
asyncio.Semaphore controls how many sub-agents can run simultaneously. Prevents resource exhaustion and API rate limits.
Rate Limiter
Token bucket rate limiter throttles LLM API calls to stay within provider RPM limits. Configurable per-deployment.
Token Budget
Set a token budget for the entire plan. When 70% is consumed, the model router auto-downgrades to cheaper models.
"budget_downgrade_threshold": 0.3
Per-Task Timeouts
Each sub-agent has a configurable timeout via asyncio.wait_for(). Prevents runaway tasks from blocking the pipeline.
Retry Policies
Failed tasks can be retried with evaluator feedback injected into the retry prompt. Configurable max_retries per sub-agent.
MCP Sandboxing
Each sub-agent accesses only the MCP servers listed in its YAML config. No unauthorized tool access across agents.
- "python-executor"
- "filesystem"
45% Token Savings
Intelligent result truncation reduces token consumption without losing critical information. Successful tool results are trimmed to 500 chars while error context is preserved in full.
- ✓ Success results: truncated to 500 chars
- ✓ Error results: full context (8000 chars)
- ✓ Tool loop steering prevents infinite loops
- ✓ Configurable via TOOL_RESULT_TRUNCATION_ENABLED
Token Usage per Multi-Step Task
Average across multi-step tasks with 5+ tool calls
Execute with Confidence.
Optimize with Data.
Five execution modes, production-grade safeguards, and automatic token optimization — all configurable from a single JSON file.
See the Evaluation Process →