In 2023, we asked "how do I write the perfect prompt?"
In 2024, we asked "how do I build a single agent with tools?"
In 2025, we're asking: "What if I had a TEAM of specialized AI agents working together?"
Welcome to multi-agent systems. Instead of one overloaded AI trying to be a coder, reviewer, AND documentation writer, you now build specialized agents - each an expert in their domain - and let them collaborate.
The results? Better than any single-agent system. Let me show you how.
Before building multi-agent systems, master single-agent development with LangChain - the foundation for understanding agent collaboration.
Why Multi-Agent Systems?
The Problem with Single Agents:
Asking one AI to "write code, test it, document it, and deploy it" is like asking one person to be a developer, QA engineer, DevOps specialist, and technical writer simultaneously. Technically possible, but the quality suffers.
The Multi-Agent Solution:
- Engineer Agent: Writes clean, efficient code
- Reviewer Agent: Critiques for bugs and style
- Tester Agent: Writes unit tests
- Documentation Agent: Creates clear docs
Each agent is specialized. They communicate. They iterate. The final output is better than what any single agent could produce.
The Framework: Microsoft AutoGen
There are several frameworks (CrewAI, LangGraph), but AutoGen remains the most mature for defining conversational patterns between agents.
pip install pyautogen python-dotenv
Building Your First Multi-Agent Team
Let's build a simple software development team: Product Manager (User), Engineer, and Code Reviewer.
Step 1: Configuration
import os
from dotenv import load_dotenv
import autogen
load_dotenv()
config_list = [
{
"model": "gpt-4o-mini",
"api_key": os.getenv("OPENAI_API_KEY")
}
]
llm_config = {
"config_list": config_list,
"temperature": 0,
"timeout": 120
}
Step 2: Define Your Agents
Each agent has a specific role defined by its system message:
# 1. User Proxy (represents you, executes code)
user_proxy = autogen.UserProxyAgent(
name="Admin",
system_message="""A human admin. You interact with the team and execute code.
After the Reviewer approves, execute the code to verify it works.""",
code_execution_config={
"work_dir": "coding",
"use_docker": False, # Set True for production
},
human_input_mode="TERMINATE", # Only ask human when task is done
max_consecutive_auto_reply=10
)
# 2. Engineer (writes code)
engineer = autogen.AssistantAgent(
name="Engineer",
llm_config=llm_config,
system_message="""You are a Senior Python Engineer.
SKILLS:
- Write clean, well-documented Python code
- Follow PEP 8 style guidelines
- Include type hints
- Handle errors gracefully
WORKFLOW:
- When given a task, write the complete solution
- Output full code blocks with explanations
- If code fails, debug and fix it based on error messages
CONSTRAINTS:
- Use only Python standard library unless told otherwise
- Add docstrings to all functions
- Keep code modular and testable
"""
)
# Agent system messages follow [advanced prompt engineering patterns like role-based prompting](/blog/gen-ai/prompt-engineering-patterns-2025) to define specialized agent personas.
# 3. Code Reviewer (reviews code quality)
reviewer = autogen.AssistantAgent(
name="CodeReviewer",
llm_config=llm_config,
system_message="""You are a meticulous Code Reviewer.
YOUR JOB:
- Review code for bugs, security issues, and style violations
- Check for edge cases not handled
- Verify error handling is proper
- Ensure code is readable and maintainable
WORKFLOW:
- If code is good, reply with "APPROVE"
- If issues found, list them specifically
- Suggest concrete improvements
STANDARDS:
- Code must have error handling
- No hardcoded values (use constants/config)
- Type hints required for function signatures
- Docstrings required
"""
)
Step 3: Create a Group Chat
Instead of sequential A→B→C conversation, use a Group Chat where a manager decides who speaks next.
# Create the group chat
groupchat = autogen.GroupChat(
agents=[user_proxy, engineer, reviewer],
messages=[],
max_round=12, # Maximum conversation turns
speaker_selection_method="auto" # Manager decides who speaks
)
# Create the manager
manager = autogen.GroupChatManager(
groupchat=groupchat,
llm_config=llm_config
)
Step 4: Start the Collaboration
# Give the team a task
user_proxy.initiate_chat(
manager,
message="""Create a Python script that:
1. Reads a CSV file named 'sales.csv'
2. Calculates total sales per product
3. Saves results to 'product_totals.json'
Handle errors gracefully (file not found, invalid data, etc.)."""
)
What Happens Next (The Magic)
- Manager receives the request, selects Engineer
- Engineer writes the Python script with error handling
- Manager sees code, selects Reviewer
- Reviewer checks the code:
- "Good error handling, but you're not closing the file properly. Use context managers."
- Manager selects Engineer again
- Engineer fixes: Uses
with open()context manager - Manager selects Reviewer
- Reviewer: "APPROVE"
- Manager selects User Proxy
- User Proxy executes the code, confirms it works
- Task complete!
The conversation transcript looks like:
Admin (to chat_manager):
Create a Python script that reads a CSV...
----------------
Engineer (to chat_manager):
I'll create a robust solution:
import csv
import json
from typing import Dict
def calculate_product_totals(input_file: str, output_file: str) -> None:
"""
Reads sales data and calculates totals per product.
... ----------------
CodeReviewer (to chat_manager):
I see a potential issue. You're using csv.DictReader but not validating
that the expected columns exist. Add column validation.
----------------
Engineer (to chat_manager):
You're right. Here's the improved version:
[Fixed code]
----------------
CodeReviewer (to chat_manager):
APPROVE. Code is production-ready.
----------------
Admin (to chat_manager):
Executing code...
SUCCESS: Created product_totals.json
Communication Patterns
Pattern 1: Sequential Handoff (Simple Chain)
Agent A → Agent B → Agent C → Done
# Engineer writes code
user_proxy.initiate_chat(
engineer,
message="Write a function to validate email addresses"
)
# Manually pass to reviewer
engineer.initiate_chat(
reviewer,
message=engineer.last_message()
)
Pros: Simple, predictable Cons: Rigid, can't adapt to unexpected situations
Pattern 2: Group Chat with Manager (Dynamic)
What we just built. Manager decides who speaks based on context.
Pros: Flexible, handles complex workflows Cons: Manager can get confused with too many agents or too long conversations
Pattern 3: Hierarchical (Boss and Workers)
One "Boss" agent delegates sub-tasks to "Worker" agents.
boss = autogen.AssistantAgent(
name="ProjectManager",
system_message="""You are a project manager.
Break down complex tasks into sub-tasks.
Delegate to appropriate team members.
Synthesize results into final deliverable."""
)
worker1 = autogen.AssistantAgent(name="DataAnalyst", ...)
worker2 = autogen.AssistantAgent(name="Visualizer", ...)
# Boss delegates
groupchat = autogen.GroupChat(
agents=[boss, worker1, worker2],
messages=[],
max_round=20
)
Use case: Complex projects like "analyze sales data, create visualizations, and write a report."
Advanced: Giving Agents Tools
Agents can use tools just like single agents.
import requests
# Define a tool function
@user_proxy.register_for_execution()
@engineer.register_for_llm(description="Search the web for current information")
def search_web(query: str) -> str:
"""Search DuckDuckGo for information"""
from duckduckgo_search import DDGS
results = DDGS().text(query, max_results=3)
return str(results)
# Now Engineer can call search_web during code generation
Integrate external tools using patterns from building with AI APIs.
Real example:
User: "Write a script to get the current Bitcoin price"
Engineer: Calls search_web("bitcoin price API") → Finds API → Writes script using that API
Real-World Use Case: Content Creation Team
# Researcher finds information
researcher = autogen.AssistantAgent(
name="Researcher",
system_message="""You research topics using available tools.
Find accurate, recent information.
Cite sources.""",
llm_config=llm_config
)
# Writer creates content
writer = autogen.AssistantAgent(
name="Writer",
system_message="""You write engaging, accurate content.
Use information from Researcher.
Write in a clear, professional style.
Include proper structure (intro, body, conclusion).""",
llm_config=llm_config
)
# Editor polishes content
editor = autogen.AssistantAgent(
name="Editor",
system_message="""You edit content for clarity and correctness.
Check grammar, flow, and factual accuracy.
Suggest improvements.
Reply with 'PUBLISH' when content is ready.""",
llm_config=llm_config
)
# User manages the process
user_proxy = autogen.UserProxyAgent(
name="ContentManager",
human_input_mode="TERMINATE",
code_execution_config=False # No code execution needed
)
# Create workflow
groupchat = autogen.GroupChat(
agents=[user_proxy, researcher, writer, editor],
messages=[],
max_round=15
)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)
# Start workflow
user_proxy.initiate_chat(
manager,
message="Write a 500-word blog post about 'The Future of Remote Work in 2025'. Research current trends first."
)
# Combine multi-agent systems with RAG for research agents
Combine multi-agent systems with RAG for research agents that retrieve and synthesize information.
# Continuing workflow...
Result:
- Researcher searches for recent articles/stats
- Writer creates draft based on research
- Editor reviews, suggests changes
- Writer revises
- Editor approves with "PUBLISH"
Quality is significantly higher than a single agent writing alone.
Cost Optimization
Multi-agent systems use MORE API calls. Optimize carefully:
1. Use Cheaper Models for Simple Agents
# Engineer needs gpt-4 for complex reasoning
engineer_config = {
"config_list": [{"model": "gpt-4o", "api_key": api_key}],
"temperature": 0
}
# Reviewer can use cheaper model
reviewer_config = {
"config_list": [{"model": "gpt-4o-mini", "api_key": api_key}],
"temperature": 0
}
2. Set Strict Termination Conditions
# Terminate when specific phrase appears
def is_termination_msg(msg):
return "TASK COMPLETE" in msg.get("content", "").upper()
user_proxy = autogen.UserProxyAgent(
name="Admin",
is_termination_msg=is_termination_msg,
max_consecutive_auto_reply=5 # Hard limit
)
3. Limit Conversation Rounds
groupchat = autogen.GroupChat(
agents=[user_proxy, engineer, reviewer],
messages=[],
max_round=10 # Stop after 10 turns even if not done
)
Error Handling and Recovery
Agents will make mistakes. Build resilience:
# Add fallback logic
user_proxy = autogen.UserProxyAgent(
name="Admin",
system_message="""If an agent gets stuck or repeats itself 3 times,
intervene with: 'Let's try a different approach.'""",
human_input_mode="TERMINATE",
# Fallback function
function_map={
"get_human_input": lambda: input("Human intervention needed: ")
}
)
Testing Multi-Agent Systems
Unit test individual agents:
def test_engineer_generates_code():
response = engineer.generate_reply(
messages=[{"role": "user", "content": "Write a hello world function"}]
)
assert "def" in response
assert "hello" in response.lower()
Integration test the full workflow:
def test_full_workflow():
# Capture the conversation
messages = []
user_proxy.initiate_chat(
manager,
message="Write a function to add two numbers"
)
# Verify sequence
assert any("def add" in msg for msg in groupchat.messages)
assert any("APPROVE" in msg for msg in groupchat.messages)
Best Practices for 2025
1. Keep Agent Roles Crystal Clear
Bad: "You are a helpful assistant who can code and review code."
Good: "You are a Code Reviewer. You ONLY review code. You do NOT write code. You do NOT execute code."
2. Limit Agents Per Chat (3-5 Max)
Too many agents = confused conversations. If you need more, use hierarchical structure.
3. Give Agents Escape Hatches
system_message="""...
If you're stuck or unsure, say 'I NEED HUMAN HELP' and explain what you need.
"""
4. Log Everything
# Save conversation for debugging
with open("conversation_log.txt", "w") as f:
for msg in groupchat.messages:
f.write(f"{msg['name']}: {msg['content']}\n\n")
5. Human in the Loop for Critical Decisions
user_proxy = autogen.UserProxyAgent(
name="Admin",
human_input_mode="ALWAYS", # Ask human before every action
)
Use ALWAYS for production systems handling sensitive operations.
Common Pitfalls
1. Agents Argue Forever No termination condition, they debate endlessly. Fix: Set max_round and explicit termination messages ("APPROVE", "TASK COMPLETE").
2. Manager Picks Wrong Agent Manager's selection logic fails with similar agents. Fix: Make agent roles/names very distinct. "CodeWriter" vs "CodeReviewer", not "Assistant1" vs "Assistant2".
3. Runaway Costs Multi-agent chat uses 5-10x API calls of single agent. Fix: Use cheaper models, strict max_round limits, cache when possible.
4. Context Window Explosion Long conversations hit token limits. Fix: Summarize periodically, or use a "memory manager" agent that summarizes history.
5. No Clear Output Conversation ends but no final deliverable extracted. Fix: Have user_proxy explicitly extract and save results:
# After conversation ends
final_code = extract_code_from_messages(groupchat.messages)
with open("output.py", "w") as f:
f.write(final_code)
The Bottom Line
Multi-agent systems represent the shift from "using AI" to "managing AI teams."
When to use:
- Complex tasks requiring different expertise
- Quality matters more than speed
- You need built-in review/validation
- Single agent keeps failing on complex workflows
When NOT to use:
- Simple tasks (overkill)
- Tight latency requirements (multi-agent is slower)
- Limited budget (uses more API calls)
Start small:
- Build 2 agents (Creator + Reviewer)
- Get them working in sequence
- Add a manager for dynamic routing
- Add more specialized agents
- Add tools
Within weeks, you can have AI teams that code, test, document, and deploy better than any single AI ever could.
The future isn't one super-intelligent AI. It's specialized AI agents collaborating like a well-oiled team.
AI could be the workforce you are missing. What's your excuse?