Introduction: From AutoGen to AG2
AutoGen was born in 2023 as a Microsoft Research project with an ambitious goal: creating a framework where AI agents could collaborate through conversation. Unlike other frameworks that treat agents as isolated workers executing tasks in sequence, AutoGen introduced a radically different paradigm: agents that talk to each other, negotiate solutions, correct one another, and produce emergent results from their interaction.
AutoGen's evolutionary path has been significant. The initial version (0.1-0.2) demonstrated the potential of multi-agent conversation. In November 2024, the project was renamed to AG2 and transferred under open governance, decoupling from direct Microsoft control. In January 2025, the AutoGen 0.4 release introduced a complete architectural redesign: an asynchronous event-driven core, modular agents, and a topic-based messaging system.
In parallel, Microsoft developed the Microsoft Agent Framework, which entered preview in October 2025 and is expected to reach General Availability in Q1 2026. This framework converges AutoGen's capabilities with Semantic Kernel, offering a unified enterprise experience integrated with Azure AI Foundry. Understanding AutoGen is therefore essential for mastering Microsoft's entire AI agent ecosystem.
What You Will Learn in This Article
- AutoGen's conversational model and why it differs from other frameworks
- Available agent types: AssistantAgent, UserProxyAgent, ConversableAgent, GroupChat
- How to implement Human-in-the-Loop with approval patterns
- The code generation and auto-correction cycle
- Emergent behaviors in multi-agent conversation
- The migration path toward Microsoft Agent Framework
- A complete case study of iterative code review with 3 agents
AutoGen vs CrewAI vs LangGraph
Before diving into AutoGen, it is useful to understand how it positions itself relative to the other multi-agent frameworks we have analyzed in this series. Each framework has a distinct architectural philosophy that determines its ideal use cases.
Multi-Agent Framework Comparison
| Feature | AutoGen/AG2 | CrewAI | LangGraph |
|---|---|---|---|
| Paradigm | Conversation-first | Role-based teams | Graph-based workflows |
| Coordination | Natural conversation | Task delegation | Explicit nodes and edges |
| Human-in-the-Loop | Native, 4 modes | Configurable | Checkpoint-based |
| Code Execution | Built-in (Docker/local) | Via tools | Custom nodes |
| Enterprise Support | Microsoft-backed | Community-driven | LangChain ecosystem |
| Learning Curve | Medium | Low | High |
| Flexibility | High | Medium | Maximum |
| Ideal Use Case | Iterative collaboration | Structured pipelines | Complex workflows |
The choice between these frameworks is not binary. In complex systems, you can combine them: use LangGraph for the global workflow, CrewAI for agent teams, and AutoGen for interactions that require iterative negotiation and human intervention. The key is understanding each framework's strengths and selecting the right tool for the specific problem.
The Conversational Model
At the heart of AutoGen is the concept that conversation is the coordination primitive. While other frameworks use graphs, queues, or pipelines to coordinate agents, AutoGen uses conversation itself. Agents communicate by exchanging messages in a shared thread, exactly like humans collaborate in a group chat.
This model offers profound advantages. Conversation is inherently flexible: there is no need to define all possible paths in advance. Agents can dynamically adapt based on what emerges from the discussion. One agent might ask an unexpected question, another might propose an alternative approach, and the system evolves organically toward a solution.
Chat History as Shared Memory
The chat history serves as shared memory among all agents. Every message sent becomes part of the common context, visible to all participants. This eliminates the need for explicit synchronization mechanisms: context propagates naturally through conversation.
- Cumulative context: each message enriches the context available for subsequent messages
- Total transparency: every agent sees the entire history, including other agents' decisions and reasoning
- Natural debugging: the chat history is itself a detailed log of the decision-making process
Turn-Taking and Termination Criteria
AutoGen manages turn-taking with configurable mechanisms. In a two-agent conversation, turns alternate naturally. In a GroupChat with multiple agents, a GroupChatManager decides who speaks next, based on the conversation context.
Termination criteria define when the conversation concludes. AutoGen supports several termination conditions:
- Keyword-based: the conversation ends when an agent emits a specific keyword (e.g.,
TERMINATE) - Max rounds: maximum turn limit to prevent infinite loops
- Function-based: a custom function evaluates whether to continue or terminate
- Human decision: the human user decides when the result is satisfactory
Agent Types in AutoGen
AutoGen provides a hierarchy of agent classes, each designed for a specific role in multi-agent conversation. Understanding these classes is fundamental for designing effective systems.
ConversableAgent: The Flexible Base
ConversableAgent is the base class from which all other agents derive. It provides
core functionality: sending and receiving messages, managing chat history, LLM integration,
and code execution. Each ConversableAgent can be configured to use or not use a language model,
to execute or not execute code, and to require or not require human approval.
from autogen import ConversableAgent
# Custom base agent with personalized configuration
custom_agent = ConversableAgent(
name="Analyst",
system_message="""You are an expert data analyst.
Analyze the provided data and produce actionable insights.
When you have completed the analysis, respond with TERMINATE.""",
llm_config={
"config_list": [{
"model": "gpt-4",
"api_key": "YOUR_API_KEY"
}],
"temperature": 0.1
},
human_input_mode="NEVER",
max_consecutive_auto_reply=10,
code_execution_config=False
)
AssistantAgent: The Reasoner
AssistantAgent is a ConversableAgent pre-configured for reasoning and response
generation. By default, it has a system prompt that instructs it to solve problems step by step.
It does not execute code directly but can generate it and pass it to other agents for execution.
It is the most common agent for tasks requiring analysis, planning, and content generation.
from autogen import AssistantAgent
assistant = AssistantAgent(
name="CodingAssistant",
llm_config={
"config_list": [{
"model": "gpt-4",
"api_key": "YOUR_API_KEY"
}]
},
system_message="""You are an expert Python programmer.
Solve problems by writing clean and well-documented code.
Always suggest unit tests for the code you produce."""
)
UserProxyAgent: The Human Representative
UserProxyAgent is the bridge between the multi-agent system and the human user.
It can operate in three modes regarding human input and, crucially, is capable of
executing code generated by other agents. When an AssistantAgent generates
a Python snippet, the UserProxyAgent executes it in a safe environment (local or Docker)
and returns the result.
from autogen import UserProxyAgent
user_proxy = UserProxyAgent(
name="UserProxy",
human_input_mode="TERMINATE",
max_consecutive_auto_reply=5,
is_termination_msg=lambda msg: "TERMINATE" in msg.get("content", ""),
code_execution_config={
"work_dir": "workspace",
"use_docker": False # True for Docker isolation
}
)
GroupChat and GroupChatManager
For conversations with more than two agents, AutoGen provides GroupChat and
GroupChatManager. GroupChat defines the group of participating agents and the
conversation rules. GroupChatManager orchestrates the turns, deciding which agent should
speak next based on context.
from autogen import GroupChat, GroupChatManager
# Group definition
group_chat = GroupChat(
agents=[coder, reviewer, tester],
messages=[],
max_round=20,
speaker_selection_method="auto" # The LLM chooses who speaks
)
# Manager that orchestrates the conversation
manager = GroupChatManager(
groupchat=group_chat,
llm_config=llm_config
)
# Start the conversation
user_proxy.initiate_chat(
manager,
message="Implement a function to sort a list using merge sort"
)
Speaker Selection Methods
The speaker_selection_method parameter controls how the next speaker is chosen:
- "auto": the LLM analyzes the conversation and decides who should speak (smartest, most expensive)
- "round_robin": agents speak in circular order (predictable, economical)
- "random": random selection among available agents (useful for brainstorming)
- "manual": the user chooses who speaks at each turn (maximum control)
Human-in-the-Loop
One of AutoGen's most distinctive features is its native and granular support for
Human-in-the-Loop (HITL). In autonomous systems, it is critical that a human
can supervise, approve, or correct agent decisions. AutoGen implements this through the
human_input_mode parameter, which supports four modes.
Human-in-the-Loop Modes
| Mode | Behavior | Use Case |
|---|---|---|
| ALWAYS | Requests human input at every turn | Critical tasks, learning, debugging |
| NEVER | Never requests human input, fully autonomous | Batch automation, CI/CD pipelines |
| TERMINATE | Requests input only when the agent wants to terminate | Final supervision, output validation |
| FUNCTION_CALL | Requests approval before every function call | Actions with side effects (APIs, databases, file system) |
The TERMINATE mode is often the best compromise between automation and control. Agents work autonomously until they reach a solution, then the user can approve the result, provide feedback for an additional iteration, or end the session. This pattern drastically reduces the cognitive load on the user without sacrificing oversight.
# TERMINATE pattern with feedback loop
user_proxy = UserProxyAgent(
name="Supervisor",
human_input_mode="TERMINATE",
max_consecutive_auto_reply=8,
is_termination_msg=lambda msg: "APPROVED" in msg.get("content", "").upper(),
default_auto_reply="Continue working. If you are done, write APPROVED."
)
# The user will see the result and can:
# 1. Press Enter to approve (empty input = auto-reply)
# 2. Type feedback to request changes
# 3. Type "exit" to end the session
Code Generation and Auto-Correction
One of AutoGen's most powerful patterns is the code generation and auto-correction cycle. An AssistantAgent generates Python code, the UserProxyAgent executes it, and if the code fails, the error result is passed back to the AssistantAgent, which analyzes the error and produces a corrected version. This cycle continues until success or until the retry limit is reached.
The Generate-Execute-Fix Cycle
AutoGen Auto-Correction Cycle:
[1] AssistantAgent generates Python code
|
v
[2] UserProxyAgent executes the code
|
+--> Success? --> Result to AssistantAgent --> TERMINATE
|
+--> Error? --> Traceback to AssistantAgent
|
v
[3] AssistantAgent analyzes the error
|
v
[4] Generates corrected code --> Back to [2]
(max N attempts)
This pattern is remarkably effective. Internal studies by Microsoft Research have shown that the success rate in solving programming problems increases significantly with iterative auto-correction compared to a single attempt. The agent learns from specific runtime errors and corrects its output in a targeted manner.
# Setup for code generation with auto-correction
assistant = AssistantAgent(
name="Coder",
llm_config=llm_config,
system_message="""You are an expert Python programmer.
When writing code:
1. Always include necessary imports
2. Handle exceptions appropriately
3. Print results with print()
4. If you receive an error, analyze it and fix the code
5. When the code works correctly, write TERMINATE"""
)
executor = UserProxyAgent(
name="Executor",
human_input_mode="NEVER",
max_consecutive_auto_reply=5,
code_execution_config={
"work_dir": "coding_workspace",
"use_docker": True, # Isolation for security
"timeout": 60
},
is_termination_msg=lambda msg: "TERMINATE" in msg.get("content", "")
)
# The cycle starts automatically
executor.initiate_chat(
assistant,
message="""Write a Python function that:
1. Reads a CSV file with pandas
2. Calculates mean, median, and standard deviation for each numeric column
3. Generates a formatted report
Test the function with sample data."""
)
Code Execution Security
Executing LLM-generated code presents significant risks. AutoGen offers two approaches to mitigate these risks:
- Docker isolation: code runs in an isolated Docker container, preventing access to the host file system and network
- Timeout: each execution has a configurable timeout to prevent infinite loops or overly long operations
- Dedicated working directory: code executes in a specific directory, limiting file access
- Human approval: with
human_input_mode="FUNCTION_CALL", the user approves each execution
Emergent Behaviors
When multiple agents converse freely, behaviors emerge that were not explicitly programmed. This is one of the most fascinating and simultaneously most dangerous aspects of conversational multi-agent systems. Emergent behaviors can be both positive and negative.
Positive Emergent Behaviors
- Unexpected creative solutions: agents proposing approaches not anticipated by the system designer. For example, a Coder agent might suggest using a different library than expected, producing a more elegant solution.
- Spontaneous negotiation: agents discussing trade-offs of different solutions before converging on the best one. A Reviewer might challenge an approach and the Coder might propose a compromise that satisfies both.
- Emergent specialization: in groups of agents with similar roles, each tends to specialize on different aspects of the problem, implicitly distributing the workload.
- Self-organization: agents develop informal communication protocols, such as summarizing the current state before proposing the next step.
Negative Emergent Behaviors
- Infinite loops: two agents bouncing the same message back and forth without progress. Common when termination criteria are too vague or when agents cannot reach consensus.
- Task deviation: the conversation drifts away from the original objective. One agent might begin discussing irrelevant details, dragging others along.
- Collaborative hallucinations: one agent produces incorrect information, others accept it as true and build upon it. The error amplifies instead of being corrected.
- Complexity escalation: agents add unrequested requirements, making the solution unnecessarily complex. A Coder might implement unrequested features "for completeness."
Risk Mitigation Strategies
| Risk | Strategy | Implementation |
|---|---|---|
| Infinite loops | Max rounds + timeout | max_round=15 in GroupChat |
| Task deviation | Rigorous system prompt | Specific instructions and explicit constraints |
| Collaborative hallucinations | Dedicated verifier agent | Critic Agent that validates every output |
| Complexity escalation | Explicit scope definition | Closed list of requirements in the initial prompt |
Migration to Microsoft Agent Framework
With the release of the Microsoft Agent Framework, the AutoGen ecosystem is converging toward a unified enterprise platform. The Framework combines AutoGen's conversational capabilities with Semantic Kernel's orchestration model, offering a complete solution for developing AI agents in corporate environments.
What Changes with the Microsoft Agent Framework
- Unified APIs: the Chat and Workflow APIs replace the separate AutoGen and Semantic Kernel interfaces, providing a coherent development experience.
- Azure AI Foundry integration: agent deployment, monitoring, and scaling are natively managed through Azure AI Foundry Agent Service, eliminating the need for custom infrastructure.
- Multi-language support: beyond Python, the Framework supports C# and Java, broadening accessibility for enterprise teams.
- Enterprise governance: security policies, audit logging, and compliance are natively integrated, not added as an afterthought.
Migration Path from AutoGen 0.2
For those with existing projects based on AutoGen 0.2, the migration path involves several steps. The main APIs have been redesigned, but the fundamental concepts remain the same.
# AutoGen 0.2 (legacy)
from autogen import AssistantAgent, UserProxyAgent
assistant = AssistantAgent("assistant", llm_config=config)
proxy = UserProxyAgent("user", code_execution_config=exec_config)
proxy.initiate_chat(assistant, message="Solve this problem")
# AutoGen 0.4 / AG2 (new)
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination
agent = AssistantAgent(
name="assistant",
model_client=model_client,
system_message="You are an expert assistant"
)
termination = TextMentionTermination("TERMINATE")
team = RoundRobinGroupChat(
participants=[agent],
termination_condition=termination
)
result = await team.run(task="Solve this problem")
Migration Tips
- Start by migrating the simplest agents to build familiarity with the new APIs
- Use the
autogen-agentchatpackage which provides a high-level interface ConversableAgentbecomesAssistantAgentwith explicit model clientGroupChatbecomes teams (RoundRobinGroupChat,SelectorGroupChat)- Code execution is handled through a dedicated
CodeExecutorAgent - Termination conditions are explicit objects, not lambda functions
Case Study: Iterative Multi-Agent Code Review
To concretely demonstrate the power of multi-agent conversation, we implement an iterative code review system with three specialized agents. The system receives a specification, produces code, reviews it, tests it, and iterates until sufficient quality is achieved.
System Architecture
Multi-Agent Code Review System:
User specification
|
v
+-------------+ code +-------------+
| CODER | -----------------> | REVIEWER |
| (writes | | (reviews |
| code) | <----------------- | quality) |
+-------------+ feedback/fix +-------------+
| |
| approved code |
v |
+-------------+ |
| EXECUTOR | <--- ok / tests failed --+
| (runs and |
| tests) |
+-------------+
|
v
Final result
Complete Implementation
from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager
# Shared LLM configuration
llm_config = {
"config_list": [{
"model": "gpt-4",
"api_key": "YOUR_API_KEY"
}],
"temperature": 0.1,
"seed": 42
}
# Agent 1: Coder - generates code
coder = AssistantAgent(
name="Coder",
llm_config=llm_config,
system_message="""You are a senior Python programmer.
Your task is to write clean, efficient, and well-documented code.
Rules:
- Always include docstrings for classes and methods
- Follow PEP 8 for style
- Handle exceptions appropriately
- Include type hints
- When you receive Reviewer feedback, fix the code
- Do NOT discuss, implement corrections directly"""
)
# Agent 2: Reviewer - reviews quality
reviewer = AssistantAgent(
name="Reviewer",
llm_config=llm_config,
system_message="""You are an expert code reviewer.
Your task is to analyze code produced by the Coder and provide feedback.
Evaluate:
- Logical correctness and edge cases
- Readability and naming conventions
- Performance and algorithmic complexity
- Error handling and robustness
- Adherence to Python best practices
Feedback format:
- ISSUES: numbered list of problems found
- SUGGESTIONS: optional improvements
- VERDICT: APPROVED or NEEDS_FIX
If the code is good, respond with APPROVED."""
)
# Agent 3: Executor - runs and tests
executor = UserProxyAgent(
name="Executor",
human_input_mode="NEVER",
max_consecutive_auto_reply=3,
code_execution_config={
"work_dir": "review_workspace",
"use_docker": True,
"timeout": 30
},
is_termination_msg=lambda msg: "APPROVED" in msg.get("content", "")
)
# GroupChat configuration
group_chat = GroupChat(
agents=[executor, coder, reviewer],
messages=[],
max_round=12,
speaker_selection_method="auto"
)
manager = GroupChatManager(
groupchat=group_chat,
llm_config=llm_config
)
# Start the process
executor.initiate_chat(
manager,
message="""Implement an LRUCache class in Python with these requirements:
1. Configurable maximum capacity
2. get(key) and put(key, value) methods with O(1) complexity
3. Eviction of the least recently used entry when cache is full
4. Thread-safety with threading.Lock
5. stats() method that returns hit rate and miss rate
Include comprehensive unit tests."""
)
Conversation Flow
Here is how the conversation between the three agents typically unfolds:
- Rounds 1-2: the Coder receives the specification and generates an initial LRUCache implementation with tests
- Rounds 3-4: the Reviewer analyzes the code and provides detailed feedback (e.g., missing edge case handling for None key, lock not used correctly)
- Rounds 5-6: the Coder corrects the code based on the Reviewer's feedback
- Rounds 7-8: the Executor runs the tests and reports any failures
- Rounds 9-10: the Coder fixes the failing tests
- Rounds 11-12: the Reviewer approves the final code with an APPROVED verdict
The result is code that has gone through multiple iterations of review and testing, producing significantly higher quality compared to a single attempt. This pattern mirrors the human code review process but happens in seconds rather than hours or days.
Best Practices for AutoGen in Production
Based on accumulated experience with AutoGen and AG2, here are the fundamental best practices for using the framework in production environments:
-
Always limit the number of rounds: set
max_roundto a reasonable value (10-20) to avoid excessive costs and infinite loops. Prefer failing with a clear message rather than iterating indefinitely. - Use Docker for code execution: in production, always execute LLM-generated code in isolated Docker containers. Never run LLM-generated code directly on the host.
- Detailed system prompts: define roles, constraints, and output formats explicitly and in detail. Vague prompts produce unpredictable behaviors.
- Monitor costs: every conversation round consumes tokens. Implement budget caps and alerting to avoid excessive spending in loop scenarios.
- Log all conversations: save the entire chat history for debugging, auditing, and continuous prompt improvement.
- Test with cheaper models first: develop and test workflows with less expensive models (GPT-3.5), then switch to GPT-4 for production.
- Implement graceful degradation: if an agent fails, the system should produce a useful partial result, not a generic error.
Conclusions
AutoGen represents a unique approach in the multi-agent framework landscape: conversation as a coordination mechanism. This paradigm offers unparalleled flexibility for tasks that require negotiation, iteration, and creative collaboration between agents.
With the evolution to AG2 and convergence toward the Microsoft Agent Framework, the ecosystem is maturing toward enterprise-ready solutions that combine the power of multi-agent conversation with the robustness of managed cloud platforms. For developers, investing in understanding these patterns is strategically important.
In the next article, we will explore multi-agent orchestration architectures, analyzing standard patterns (Sequential, Concurrent, Handoff, Plan-First), Hub-and-Spoke vs Peer-to-Peer architectures, and how to build production-ready systems with fault tolerance, distributed state management, and observability.







