04 - Multi-Agent Coding: LangGraph, CrewAI, and AutoGen
Picture this: you need to implement a complete authentication system for an enterprise application from scratch - OAuth2, RBAC, audit logging, integration tests, and documentation. A single AI agent, however capable, would quickly hit its limits: an overcrowded context window, conflicting responsibilities, and a high risk of cascading errors. The solution is not a more powerful agent - it is a team of specialized agents that collaborate, cross-check each other's work, and parallelize the workload.
This is the Multi-Agent Coding paradigm: systems where multiple autonomous AI agents cooperate to complete software development tasks that exceed the capacity of any single model. In 2025, three frameworks dominated this space: LangGraph with its stateful graph architecture, CrewAI with its intuitive role-based model, and AutoGen (now AG2) with its conversational approach. Claude Code's native sub-agent system rounds out the competitive landscape.
This article is an advanced deep dive into all four systems: architectures, working code examples, a detailed comparison, and a practical guide to choosing the right tool for your use case. If you are already familiar with vibe coding basics and want to take your agentic workflows to the next level, you are in the right place.
What You Will Learn
- Why a single AI agent falls short for complex development tasks
- LangGraph: graph architecture, StateGraph, Nodes, Edges, and Checkpointing
- CrewAI: role-based agents, Tasks, sequential and parallel Process, Tools
- AutoGen/AG2: conversational agents, GroupChat, code execution sandbox
- Claude Code: sub-agents, Task tool, and parallel execution patterns
- Detailed comparison: when to use which framework
- Production-ready architecture for real engineering teams
- Critical challenges: context pollution, error propagation, cost management
- Best practices for agent specialization and fallback strategies
Why a Single Agent Is Not Enough
Before diving into frameworks, it is worth understanding the fundamental problem that multi-agent coding solves. A single AI agent, however advanced, has structural limitations that become apparent with complex development tasks.
The first constraint is the context window. Even with models supporting 200K tokens, a typical enterprise task requires holding in mind simultaneously: the existing codebase, functional specifications, architectural patterns, tests to write, documentation to update, and security constraints. This quickly exceeds the coherent capacity of a single context.
The second constraint is specialization. A generalist agent tends to do everything adequately rather than specific things excellently. A security-specialized agent knows exactly which patterns to look for and which standards to apply, while a testing specialist knows the right test patterns for every component type.
The third constraint is cross-verification. A single agent that writes code and then "tests" it is essentially reviewing its own work with the same cognitive bias that produced it in the first place. Two separate agents - one implementing and one reviewing - bring genuinely different perspectives.
The Principle of Role Separation
In human development teams, role separation (developer, code reviewer, QA, security engineer, tech writer) is not bureaucracy: it is a cognitive safeguard. Multi-agent systems apply the same principle to AI-generated code, systematically reducing each individual agent's blind spots.
2025 research confirms this intuition: AI-generated code has significantly higher vulnerability rates when produced by unsupervised single agents (Veracode 2025 reports 2.74x more vulnerabilities compared to human-written code). Multi-agent systems with dedicated review agents substantially close this gap.
LangGraph: Stateful Graph Orchestration
LangGraph, developed by the LangChain team, represents the natural evolution from linear chains toward stateful cyclic graphs. The fundamental insight is that complex agentic workflows are not linear: they require loops, conditional branching, parallelism, and state persistence between steps.
The LangChain team itself communicated clearly in 2025: "Use LangGraph for agents, not LangChain." This reflects an important architectural truth: modern agentic systems are fundamentally state machines, not sequential pipelines.
Core LangGraph Concepts
LangGraph is built on four key concepts that must be mastered before building effective multi-agent systems:
- StateGraph: the main graph defining the workflow structure and the type of shared state between nodes
- State: a TypedDict (or Pydantic model) representing shared information accessible to all graph nodes
- Nodes: Python functions that receive state, perform an operation (often an LLM call) and return state updates
- Edges: connections between nodes, which can be fixed or conditional based on current state
from typing import TypedDict, List
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage
# ============================================================
# 1. SHARED STATE DEFINITION
# ============================================================
class CodingState(TypedDict):
"""State shared across all agents in the system."""
task_description: str
requirements: List[str]
generated_code: str
test_code: str
review_comments: List[str]
security_issues: List[str]
final_code: str
iteration_count: int
status: str # "planning", "coding", "testing", "reviewing", "done"
# ============================================================
# 2. MODEL INITIALIZATION
# ============================================================
# Planner: Opus for complex reasoning
planner_model = ChatAnthropic(model="claude-opus-4-6")
# Developer: Sonnet for fast code generation
developer_model = ChatAnthropic(model="claude-sonnet-4-6")
# Reviewer: Sonnet for critical analysis
reviewer_model = ChatAnthropic(model="claude-sonnet-4-6")
# ============================================================
# 3. NODE DEFINITIONS (AGENTS)
# ============================================================
def planner_agent(state: CodingState) -> dict:
"""Planning agent: decomposes task into requirements."""
prompt = f"""You are a senior software architect.
Task: {state['task_description']}
Analyze the task and produce a list of specific technical requirements.
Format: a bullet list of clear, implementable requirements.
"""
response = planner_model.invoke([HumanMessage(content=prompt)])
requirements = [
line.strip().lstrip("- ")
for line in response.content.split("\n")
if line.strip().startswith("-")
]
return {
"requirements": requirements,
"status": "coding"
}
def developer_agent(state: CodingState) -> dict:
"""Developer agent: implements code from requirements."""
requirements_text = "\n".join(f"- {r}" for r in state["requirements"])
prompt = f"""You are a senior Python developer.
Implement the code satisfying all requirements.
Original task: {state['task_description']}
Requirements: {requirements_text}
Reviewer notes: {chr(10).join(state.get('review_comments', []))}
Produce ONLY the Python code, no explanations.
"""
response = developer_model.invoke([HumanMessage(content=prompt)])
return {
"generated_code": response.content,
"status": "testing"
}
def test_writer_agent(state: CodingState) -> dict:
"""Test writer agent: writes unit tests for generated code."""
prompt = f"""Write comprehensive pytest tests for:
{state['generated_code']}
Requirements: 80% coverage, happy paths, edge cases, error scenarios.
Produce ONLY test code.
"""
response = developer_model.invoke([HumanMessage(content=prompt)])
return {
"test_code": response.content,
"status": "reviewing"
}
def code_reviewer_agent(state: CodingState) -> dict:
"""Code reviewer agent: analyzes code and tests for quality."""
prompt = f"""Review this code and tests for quality (SOLID, DRY, KISS),
correctness, performance, and test completeness.
CODE: {state['generated_code']}
TESTS: {state['test_code']}
Reply "NEEDS_REVISION: [issues]" or "APPROVED: [minor suggestions]".
"""
response = reviewer_model.invoke([HumanMessage(content=prompt)])
needs_revision = response.content.startswith("NEEDS_REVISION:")
return {
"review_comments": [response.content],
"iteration_count": state.get("iteration_count", 0) + 1,
"status": "coding" if needs_revision else "security_check"
}
def security_agent(state: CodingState) -> dict:
"""Security agent: checks code for vulnerabilities."""
prompt = f"""Analyze for security vulnerabilities (injection, hardcoded
secrets, insecure random, race conditions, input validation gaps):
{state['generated_code']}
List ONLY issues found. If none, write "SECURITY_OK".
"""
response = reviewer_model.invoke([HumanMessage(content=prompt)])
issues = [] if response.content.strip() == "SECURITY_OK" else [response.content]
return {
"security_issues": issues,
"final_code": state['generated_code'] if not issues else "",
"status": "done" if not issues else "coding"
}
# ============================================================
# 4. CONDITIONAL ROUTING
# ============================================================
def route_after_review(state: CodingState) -> str:
if state["status"] == "coding" and state.get("iteration_count", 0) < 3:
return "developer"
return "security"
def route_after_security(state: CodingState) -> str:
if state["security_issues"] and state.get("iteration_count", 0) < 3:
return "developer"
return END
# ============================================================
# 5. GRAPH CONSTRUCTION AND EXECUTION
# ============================================================
def build_coding_graph() -> StateGraph:
graph = StateGraph(CodingState)
graph.add_node("planner", planner_agent)
graph.add_node("developer", developer_agent)
graph.add_node("test_writer", test_writer_agent)
graph.add_node("reviewer", code_reviewer_agent)
graph.add_node("security", security_agent)
graph.set_entry_point("planner")
graph.add_edge("planner", "developer")
graph.add_edge("developer", "test_writer")
graph.add_edge("test_writer", "reviewer")
graph.add_conditional_edges("reviewer", route_after_review,
{"developer": "developer", "security": "security"})
graph.add_conditional_edges("security", route_after_security,
{"developer": "developer", END: END})
return graph
checkpointer = MemorySaver()
app = build_coding_graph().compile(checkpointer=checkpointer)
config = {"configurable": {"thread_id": "project-auth-001"}}
initial_state = {
"task_description": "Implement JWT auth with login, access/refresh tokens, middleware",
"requirements": [], "generated_code": "", "test_code": "",
"review_comments": [], "security_issues": [], "final_code": "",
"iteration_count": 0, "status": "planning"
}
result = app.invoke(initial_state, config=config)
print(f"Done after {result['iteration_count']} iterations")
This example shows how LangGraph manages a complex development workflow with automatic revision loops. Checkpointing is a critical production feature: it saves graph state so execution can resume after errors without starting from scratch.
LangGraph: Key Strengths
- Granular control: every aspect of the flow is programmable
- State persistence: built-in checkpointing for long workflows
- Human-in-the-loop: native pause support for human approval
- Parallel execution: nodes can run in parallel
- Streaming: real-time output from each node
- Production tested: used in production by hundreds of companies in 2025
CrewAI: Role-Based Agents for Virtual Teams
CrewAI takes a fundamentally different approach from LangGraph: instead of thinking in terms of graphs and states, it asks you to think in terms of teams and roles. Each agent is a team member with a defined specialization, a clear goal, and a set of tools at their disposal. This mental model is more intuitive for those with experience managing development teams.
In version 1.1.0 released in 2025, CrewAI introduced the separation between Crews (high-level orchestration) and Flows (granular workflow control), offering LangGraph's flexibility with the simplicity of the role-based model.
CrewAI Architecture
The four fundamental elements of CrewAI are:
- Agent: an AI entity with role, goal, and backstory that define its "personality" and area of expertise
- Task: a specific activity with description, expected_output, and assigned to a particular agent
- Tool: additional capabilities for agents (file I/O, web search, code execution, database access)
- Crew: the team that orchestrates agents and tasks with a process (sequential or parallel)
from crewai import Agent, Task, Crew, Process
from crewai.tools import CodeInterpreterTool, FileReadTool, FileWriteTool
from langchain_anthropic import ChatAnthropic
opus_llm = ChatAnthropic(model="claude-opus-4-6", temperature=0.1)
sonnet_llm = ChatAnthropic(model="claude-sonnet-4-6", temperature=0.2)
code_executor = CodeInterpreterTool()
file_reader = FileReadTool()
file_writer = FileWriteTool()
# ============================================================
# AGENT DEFINITIONS - Each agent has a clear specialization
# ============================================================
tech_lead = Agent(
role="Tech Lead and Software Architect",
goal="Analyze requirements, define optimal architecture, break down work into tasks.",
backstory="""15 years of software architecture experience. Led microservices
migrations at 3 unicorn startups. Known for balancing pragmatism and quality.""",
llm=opus_llm,
verbose=True,
allow_delegation=True
)
senior_developer = Agent(
role="Senior Python Developer",
goal="Implement high-quality, clean, testable Python code per SOLID principles.",
backstory="""8 years Python experience. Open source contributor.
Writes code that other developers love to read and maintain.""",
llm=sonnet_llm,
tools=[code_executor, file_writer],
allow_code_execution=True,
verbose=True
)
qa_engineer = Agent(
role="QA Engineer and Test Specialist",
goal="Verify code correctness through exhaustive testing, ensure 90%+ coverage.",
backstory="""Math background, obsessed with software correctness.
Found critical bugs in systems in production for years.""",
llm=sonnet_llm,
tools=[code_executor],
allow_code_execution=True,
verbose=True
)
code_reviewer = Agent(
role="Senior Code Reviewer",
goal="Critically analyze code for quality, maintainability, and security.",
backstory="""Reviewed 10,000+ pull requests. Unerring eye for code smells
and design problems that manifest only in production.""",
llm=sonnet_llm,
tools=[file_reader],
verbose=True
)
# ============================================================
# TASK DEFINITIONS - Ordered pipeline with context dependencies
# ============================================================
architecture_task = Task(
description="""Analyze: {task_description}
Produce: component diagram (ASCII), tech stack, directory structure,
main interfaces, scalability considerations, prioritized task list.""",
expected_output="Detailed markdown architecture document with ASCII diagrams.",
agent=tech_lead
)
implementation_task = Task(
description="""Implement Python code per the architect's document.
Use type hints, Google-style docstrings, custom exceptions, separate modules.""",
expected_output="Complete working Python code with type hints and docstrings.",
agent=senior_developer,
context=[architecture_task]
)
testing_task = Task(
description="""Write pytest tests: 90% coverage, unit + integration tests,
happy paths, edge cases, error scenarios, mocking for external deps.
Run tests to verify they all pass.""",
expected_output="Complete passing pytest files with 90%+ coverage report.",
agent=qa_engineer,
context=[implementation_task]
)
review_task = Task(
description="""Code review: SOLID compliance, code smells, performance,
security (OWASP), documentation quality.
Report: CRITICAL, WARNING, SUGGESTION with file/line and fix for each.""",
expected_output="Structured review report with CRITICAL/WARNING/SUGGESTION categories.",
agent=code_reviewer,
context=[implementation_task, testing_task]
)
# ============================================================
# CREW ASSEMBLY AND EXECUTION
# ============================================================
coding_crew = Crew(
agents=[tech_lead, senior_developer, qa_engineer, code_reviewer],
tasks=[architecture_task, implementation_task, testing_task, review_task],
process=Process.sequential,
verbose=True,
memory=True,
embedder={
"provider": "anthropic",
"config": {"model": "claude-3-haiku-20240307"}
}
)
result = coding_crew.kickoff(inputs={
"task_description": """
Rate limiting system for REST API:
- Token bucket algorithm, Redis backend for multi-instance distribution
- Per-endpoint and per-user configuration
- Standard X-RateLimit-* headers
- Prometheus metrics dashboard
"""
})
print(result.raw)
A distinctive element of CrewAI 2025 is the allow_code_execution=True
option on agents: it enables code execution in a secure sandbox with automatic error
handling and intelligent retry. If code raises an exception, the agent receives the
error message and attempts to fix it autonomously (up to max_retry_limit,
default 2).
CrewAI: Key Strengths
- Low learning curve: the team/roles mental model is intuitive
- Built-in code execution: secure sandbox with automatic retry
- Shared memory: agents remember context from previous conversations
- MCP support: bidirectional integration with MCP servers (2025)
- Enterprise features: observability, audit logs, SaaS control plane
- Flows: granular control for complex workflows when needed
AutoGen/AG2: Conversational Agents for Emergent Tasks
AutoGen (renamed AG2 by the community that continued development after diverging from Microsoft in 2024) adopts a completely different paradigm: conversational multi-agent. Instead of defining fixed workflows or rigid roles, agents communicate through natural language messages, with complex behaviors emerging from simple interactions.
This approach is particularly effective for tasks where the solution path cannot be defined in advance: complex debugging problems, research and implementation of innovative solutions, or tasks requiring negotiation between competing constraints.
AG2 Architecture: AssistantAgent and UserProxyAgent
The fundamental building block of AG2 is the AssistantAgent (AI agent) and UserProxyAgent (execution proxy) pair. The UserProxyAgent can execute code in a local sandbox, providing real feedback to the AssistantAgent, which can then iterate on the solution based on actual execution results.
import autogen
from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager
from autogen.coding import LocalCommandLineCodeExecutor
import os
config_list = [{
"model": "claude-sonnet-4-6",
"api_key": os.environ["ANTHROPIC_API_KEY"],
"api_type": "anthropic",
}]
llm_config = {"config_list": config_list, "temperature": 0.1, "timeout": 120}
opus_config = {"config_list": [{
"model": "claude-opus-4-6",
"api_key": os.environ["ANTHROPIC_API_KEY"],
"api_type": "anthropic"
}], "temperature": 0.1}
# Secure sandbox - blocks destructive commands
executor = LocalCommandLineCodeExecutor(
timeout=60,
work_dir="/tmp/autogen_workspace",
execution_policies={"rm": False, "curl": False, "wget": False}
)
# ============================================================
# AGENT DEFINITIONS
# ============================================================
architect = AssistantAgent(
name="SoftwareArchitect",
system_message="""Senior software architect. Analyze requirements, propose solid
architectures with ASCII diagrams, define component interfaces, identify risks.
Reply TERMINATE when task is complete.""",
llm_config=opus_config,
)
developer = AssistantAgent(
name="PythonDeveloper",
system_message="""Senior Python developer. Implement code per architect specs:
type hints, docstrings, robust error handling, SOLID/DRY principles.
Use valid Python code blocks. Reply TERMINATE only when both approve.""",
llm_config=llm_config,
)
tester = AssistantAgent(
name="QATester",
system_message="""QA engineer. Write exhaustive pytest tests, run them in sandbox,
report results. Identify uncovered edge cases. Target 85%+ coverage.
Always produce runnable tests verifiable in sandbox.""",
llm_config=llm_config,
)
reviewer = AssistantAgent(
name="CodeReviewer",
system_message="""Senior code reviewer. Analyze code smells, OWASP Top 10,
suggest refactoring, approve or request changes.
End each review with APPROVED or NEEDS_WORK: [issue list].""",
llm_config=llm_config,
)
# UserProxy bridges agents and sandbox
code_executor_proxy = UserProxyAgent(
name="CodeExecutor",
human_input_mode="NEVER",
max_consecutive_auto_reply=5,
code_execution_config={"executor": executor},
is_termination_msg=lambda msg: "TERMINATE" in msg.get("content", ""),
)
# ============================================================
# GROUP CHAT - STRUCTURED CONVERSATION FLOW
# ============================================================
group_chat = GroupChat(
agents=[architect, developer, tester, reviewer, code_executor_proxy],
messages=[],
max_round=20,
speaker_selection_method="auto",
allowed_or_disallowed_speaker_transitions={
architect: [developer],
developer: [tester, code_executor_proxy],
tester: [code_executor_proxy, reviewer],
code_executor_proxy: [developer, tester, reviewer],
reviewer: [developer, architect],
},
speaker_transitions_type="allowed",
)
manager = GroupChatManager(
groupchat=group_chat,
llm_config=llm_config,
system_message="""Project manager. Orchestrate conversation for efficient task
completion, ensuring each agent contributes at the right moment.""",
)
# ============================================================
# INITIATE CONVERSATION
# ============================================================
task = """
Build an intelligent caching system in Python:
- LRU cache with configurable per-entry TTL
- Multiple backends: in-memory, Redis, or filesystem
- @cache decorator for transparent application to functions
- Real-time hit/miss statistics
- Thread-safe for concurrent applications
Full architecture, implementation, tests, and code review.
"""
chat_result = code_executor_proxy.initiate_chat(
manager,
message=task,
summary_method="reflection_with_llm",
)
print(chat_result.summary)
AG2: Key Strengths
- Real code execution: code is actually run in a sandbox, agents see real outputs
- Emergent flexibility: workflows adapt to the problem, not the other way around
- AG-UI protocol: dynamic frontends with real-time streaming (2025)
- OpenTelemetry: full observability for agentic workflows
- Human-in-the-loop: easily configurable with
human_input_mode - TypeScript support: Microsoft AutoGen 0.4 with native TypeScript
Claude Code: Sub-Agents and Parallel Execution
Claude Code has a native multi-agent system that operates fundamentally differently from the Python frameworks above. Instead of orchestrating LLM models via API, Claude Code launches separate Claude instances (sub-agents) directly from the terminal, each with its own clean context window and the ability to operate on the filesystem in isolation.
This approach has significant advantages: no framework overhead, native integration with terminal tools, and the ability to run tasks in parallel on multi-core systems. The Task tool is the key mechanism: it launches a sub-agent with a specific prompt and waits for the result before proceeding (or, if tasks are independent, launches them all in parallel).
# Code Reviewer Agent
## Role
Senior code reviewer focused on quality, security, and maintainability.
## Expertise
- SOLID principles and design patterns
- OWASP Top 10 and application security
- Performance optimization
- Clean code and refactoring
## Review Process
1. Read ALL modified files before commenting
2. Identify issues by category: CRITICAL, WARNING, SUGGESTION
3. For each issue: specify file, line, and recommended fix
4. Verify tests cover error cases
5. Check for hardcoded secrets or credentials
## Expected Output
Markdown report with sections:
- Executive Summary
- CRITICAL Issues (blocking for merge)
- WARNINGS (resolve before next release)
- SUGGESTIONS (optional improvements)
- Security Checklist
# Claude Code multi-agent orchestration examples
# (content for CLAUDE.md or interactive prompts)
# --- PATTERN 1: SEQUENTIAL WITH FEEDBACK LOOP ---
Implement the following features for the auth/ module:
1. Use the Task tool to launch the "planner" sub-agent:
Input: "Analyze auth/ and create an implementation plan for
adding 2FA with TOTP (RFC 6238)"
Wait for the complete plan.
2. Use the Task tool to launch the "developer" sub-agent:
Input: "Implement the following plan: [planner output]"
Wait for the complete implementation.
3. Use the Task tool to launch IN PARALLEL:
- Sub-agent "code-reviewer": review code in auth/
- Sub-agent "security-auditor": verify security of auth/
(launch both simultaneously, wait for both)
4. Apply fixes for all CRITICAL issues found.
---
# --- PATTERN 2: PARALLEL WORKTREES ---
Execute IN PARALLEL on separate git worktrees:
Task A (worktree: feature/user-service):
- Implement UserService with complete CRUD
- Write unit tests with 90% coverage
- Commit: "feat: add UserService with full CRUD"
Task B (worktree: feature/email-service):
- Implement EmailService with template system
- Integrate with SendGrid API
- Commit: "feat: add EmailService with SendGrid"
Task C (worktree: feature/notification-service):
- Implement NotificationService (email + push + SMS)
- Use UserService and EmailService as dependencies
- Commit: "feat: add NotificationService"
After all three complete: merge in order A, B, C,
resolve conflicts, run full test suite.
---
# --- PATTERN 3: SHARED CONTEXT FILE ---
Before starting, create /tmp/project-context.md with:
- Project tech stack and versions
- Naming conventions and coding style
- Architectural patterns in use
- Key dependencies and constraints
Each sub-agent must:
1. Read /tmp/project-context.md at startup
2. Update /tmp/progress.md with task status
3. Save structured outputs to /tmp/agent-outputs/[name]/
This guarantees consistency across parallel agents
without merge conflicts in shared code.
Lesson from the Replit Incident (2025)
In 2025, Replit documented a case where an autonomous agent deleted a production
database during a "cleanup" task. This incident highlighted the need to explicitly
configure limits on destructive commands. In Claude Code, this is managed through
the deny section in .claude/settings.json: always block
rm -rf, DROP TABLE, git push --force and similar.
Sub-agents inherit the same restrictions as the main agent.
Detailed Comparison: Which Framework to Choose
The choice of framework depends on the specific context. There is no absolute winner: each tool excels in different scenarios. The following table summarizes key differences to help with the decision.
| Criterion | LangGraph | CrewAI | AG2 (AutoGen) | Claude Code |
|---|---|---|---|---|
| Paradigm | Stateful graph | Role-based team | Conversational | Native sub-agent |
| Learning curve | High (graph theory) | Low (intuitive) | Medium | Low |
| Flow control | Maximum | Medium-high | Emergent | Medium |
| Code execution | Via tool | Built-in (sandbox) | Built-in (local) | Native Bash |
| State persistence | Native checkpointing | Built-in memory | Chat messages | Filesystem |
| Parallelism | Parallel nodes | Process.parallel | Async GroupChat | Native Task tool |
| Observability | LangSmith | CrewAI Enterprise | OpenTelemetry | Native logs |
| Human-in-the-loop | Native interrupt | Callback | human_input_mode |
Interactive |
| Ecosystem | LangChain | Independent | Microsoft/AG2 | Anthropic |
| Best for | Complex workflows with conditional logic | Virtual teams with defined roles | Exploratory tasks with code execution | CLI automation on existing projects |
Practical Recommendations
- Choose LangGraph if you have complex workflows with many conditional branches, need reliable state persistence between sessions, and have a team comfortable with state machine and directed graph concepts.
- Choose CrewAI if you want to start quickly with multi-agent, your team thinks in terms of organizational roles, and you need built-in code execution with automatic error handling.
- Choose AG2 if the task is exploratory without a predefined workflow, you need genuine multi-round conversations with real code execution, and you are in the Microsoft ecosystem (Azure, TypeScript).
- Choose Claude Code if you already work with Claude, want multi-agent without additional frameworks, and your primary use case is automating development tasks on existing codebases.
Production-Ready Architecture for Real Teams
A multi-agent system in production requires architectural considerations that go beyond simple agent orchestration. Companies that deployed these systems in 2025 learned important lessons that are worth knowing before you begin.
Three Proven Architectural Patterns
- Supervisor Pattern: an orchestrator agent delegates tasks to specialized agents and aggregates results. Ideal for LangGraph with conditional routing.
- Pipeline Pattern: agents in sequence where one output becomes the
next input. Natural in CrewAI with
contextand dependent tasks. - Peer-to-Peer Pattern: agents communicating freely without a central orchestrator. Natural in AG2 with GroupChat.
from pydantic import BaseModel, Field
from typing import List, Literal, Optional
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage
import logging
import time
logger = logging.getLogger(__name__)
# ============================================================
# COST TRACKER - Essential for production multi-agent systems
# ============================================================
class CostTracker:
# Anthropic pricing (USD per 1M tokens, Feb 2026)
PRICES = {
"claude-opus-4-6": {"input": 15.0, "output": 75.0},
"claude-sonnet-4-6": {"input": 3.0, "output": 15.0},
"claude-haiku-4-5": {"input": 0.25, "output": 1.25},
}
def __init__(self, budget_usd: float):
self.budget = budget_usd
self.spent = 0.0
self.calls = 0
def track(self, model: str, input_tokens: int, output_tokens: int) -> None:
prices = self.PRICES.get(model, {"input": 3.0, "output": 15.0})
cost = (input_tokens * prices["input"] + output_tokens * prices["output"]) / 1_000_000
self.spent += cost
self.calls += 1
if self.spent > self.budget * 0.8:
logger.warning(f"Cost: 






