NvAgent — Execution Methods

Flexible Task Execution

Five Modes of
Execution

The orchestrator selects the optimal execution strategy for every task — from simple routing to complex DAG-based parallel workflows with dependency management.

Execution Strategies

The Right Mode for Every Task

The task planner analyzes your request and automatically selects the execution mode that maximizes speed without sacrificing quality.

Simplest

Router

Routes the entire request to exactly one sub-agent. Used when the task maps cleanly to a single domain expert — no decomposition needed.

Best for: "Review this code for security issues" → routes to code-scanner. Single-domain tasks with clear agent mapping.

User Request

↓

Router — Agent Selection

↓

Code Scanner

↓

Result

Step-by-Step

Sequential

Each task depends on the output of the previous step. The pipeline flows linearly — research feeds analysis, analysis feeds reporting.

Best for: "Research competitor pricing, analyze the data, then write a report" — each step builds on the last.

t1: Researcher

↓ output feeds into

t2: Data Analyst

↓ output feeds into

t3: Writer

↓

Final Report

Fastest

Parallel

All tasks are independent and run simultaneously. The orchestrator spawns up to 5 concurrent sub-agents and waits for all to complete.

Best for: "Analyze sales data AND research market trends AND review our codebase" — three independent investigations.

User Request

↓ ↓ ↓

Data Analyst

Researcher

Code Reviewer

↓ ↓ ↓

Merged Results

Most Flexible

DAG (Directed Acyclic Graph)

A mix of parallel and sequential — tasks form a dependency graph. Independent branches run concurrently while dependent tasks wait for upstream completion.

Best for: "Research market + analyze data (parallel), then competitive analysis (depends on both), then executive presentation (depends on all)."

t1: Researcher

t2: Data Analyst

↘ ↙

t3: Competitive Analyst
depends_on: [t1, t2]

↓

t4: Report Writer
depends_on: [t3]

↓

Executive Deck

Adaptive

Iterative

When the full scope is unknown upfront, the orchestrator executes one step at a time and decides the next step based on results. Exploration-driven workflows.

Best for: "Investigate why our API latency spiked" — start with data analysis, then drill into specific services based on findings.

Step 1: Investigate

↓ analyze results

Step 2: Drill deeper

↓ analyze results

Step 3: Root cause

↓ done?

✓ Complete

Controls & Safeguards

Production-Grade Execution

Fine-grained controls for concurrency, rate limiting, timeouts, and budget management — configurable per deployment.

🔒

Concurrency Limiter

asyncio.Semaphore controls how many sub-agents can run simultaneously. Prevents resource exhaustion and API rate limits.

"max_concurrent_subagents": 5

⏱️

Rate Limiter

Token bucket rate limiter throttles LLM API calls to stay within provider RPM limits. Configurable per-deployment.

"rate_limit_rpm": 60

💰

Token Budget

Set a token budget for the entire plan. When 70% is consumed, the model router auto-downgrades to cheaper models.

"token_budget": 100000
"budget_downgrade_threshold": 0.3

⏳

Per-Task Timeouts

Each sub-agent has a configurable timeout via asyncio.wait_for(). Prevents runaway tasks from blocking the pipeline.

timeout_seconds: 600

🔄

Retry Policies

Failed tasks can be retried with evaluator feedback injected into the retry prompt. Configurable max_retries per sub-agent.

"max_retries": 2

🛡️

MCP Sandboxing

Each sub-agent accesses only the MCP servers listed in its YAML config. No unauthorized tool access across agents.

mcp_servers:
- "python-executor"
- "filesystem"

Optimization

45% Token Savings

Intelligent result truncation reduces token consumption without losing critical information. Successful tool results are trimmed to 500 chars while error context is preserved in full.

✓ Success results: truncated to 500 chars
✓ Error results: full context (8000 chars)
✓ Tool loop steering prevents infinite loops
✓ Configurable via TOOL_RESULT_TRUNCATION_ENABLED

Token Usage per Multi-Step Task

Without
Optimization

~45,000 tokens

With
Truncation

~24,750 tokens

Average across multi-step tasks with 5+ tool calls

Execute with Confidence.
Optimize with Data.

Five execution modes, production-grade safeguards, and automatic token optimization — all configurable from a single JSON file.

See the Evaluation Process →

Five Modes ofExecution