Framework Deep Dive: CrewAI - Role-Based Multi-Agent Orchestration
An in-depth exploration of CrewAI's role-based architecture, crew orchestration patterns, task delegation, and production best practices for building collaborative AI agent teams
The OpenAI Agents SDK launched as a Python library in March 2025 and has since become one of the most prominent entries in the agent framework landscape. Unlike LangChain’s broad integration ecosystem or CrewAI’s role-based orchestration, the Agents SDK takes a deliberately minimal stance: provide a small set of powerful primitives—Agent, Runner, Handoff, Guardrail—and let developers compose them in code. (OpenAI’s 2025 launch announcement framed it as a shift from the hosted Assistants API to code-first agent construction.)
The April 2026 update added the most significant feature set since launch: native sandbox execution, a model-native harness, and extensible sandbox clients (TechCrunch coverage, April 15 2026). This changed the SDK from “another wrapper around the Completions API” into a full execution harness.
This deep dive examines the SDK’s architecture, walks through its core primitives with complete code, and provides our takes on production patterns we’ve seen teams adopt. For how the SDK compares to OpenAI’s older Assistants API and Anthropic’s MCP approach, see our earlier OpenAI Assistants API vs Claude MCP comparison.
The SDK distills agent construction into five types. Understanding how they compose is more important than memorizing every parameter.
An Agent is an LLM configured with instructions, tools, and optional handoff targets. It’s the fundamental unit of computation.
from agents import Agent, function_tool, ModelSettings
@function_tool
def get_weather(city: str) -> str:
"""Get the current weather for a city."""
return "22°C, partly cloudy"
@function_tool
def search_knowledge(query: str) -> str:
"""Search the internal knowledge base."""
return f"Found 3 results for '{query}'"
support_agent = Agent(
name="Support Agent",
instructions="You are a customer support agent. Use tools to answer questions.",
tools=[get_weather, search_knowledge],
model="gpt-4o",
model_settings=ModelSettings(temperature=0.3)
)
Unlike CrewAI where agents carry backstories and role definitions, the Agents SDK treats instructions as system prompts—straightforward, no persona theater. This is a feature, not a limitation. Teams that tried to build elaborate role systems on top of the SDK reported that LLMs responded better to direct, domain-specific instructions than to fictional role descriptions.
The Runner executes an agent against input. It manages the tool-call loop, tracks context, and returns structured results.
from agents import Agent, Runner, function_tool
import asyncio
@function_tool
def lookup_order(order_id: str) -> str:
"""Look up an order by ID."""
orders = {
"ORD-001": "Shipped - arrives Tuesday",
"ORD-002": "Processing - ships tomorrow"
}
return orders.get(order_id, "Order not found")
async def main():
agent = Agent(
name="Order Lookup",
instructions="Help customers check their order status.",
tools=[lookup_order]
)
result = await Runner.run(agent, "Where is order ORD-001?")
print(result.final_output)
asyncio.run(main())
The Runner.run() result exposes final_output, last_agent, input_guardrail_results, output_guardrail_results, and a new_items list containing every turn in the conversation. For production debugging, that traceability is essential.
Handoffs are the SDK’s multi-agent mechanism. Instead of CrewAI’s explicit process types or LangGraph’s graph edges, the SDK lets one agent delegate to another via a tool call the LLM discovers at runtime.
from agents import Agent, Runner, handoff
import asyncio
triage_agent = Agent(
name="Triage Agent",
instructions=(
"Route customer inquiries to the appropriate specialist. "
"Use handoffs when a specialist is needed."
),
handoffs=[
handoff(
Agent(
name="Refund Specialist",
instructions="Process refund requests. Verify order first."
),
tool_name_override="transfer_to_refunds"
),
handoff(
Agent(
name="Technical Support",
instructions="Debug technical issues with products."
),
tool_name_override="transfer_to_tech_support"
)
]
)
async def main():
result = await Runner.run(
triage_agent,
"I need a refund for order ORD-001, it never arrived."
)
print(f"Final agent: {result.last_agent.name}")
print(f"Output: {result.final_output}")
asyncio.run(main())
Handoffs are represented as tools to the LLM. The triage agent sees transfer_to_refunds and transfer_to_tech_support in its tool list, and the LLM decides which to call. The handoff() function also accepts input_type for structured handoff payloads and on_handoff callbacks for side effects like logging or context filtering.
Guardrails run validation checks on input, output, or tool invocation. They can operate in blocking mode (halt execution on failure) or parallel mode (run alongside the agent).
from agents import (
Agent, Runner, InputGuardrail, OutputGuardrail,
GuardrailFunctionOutput, function_tool
)
from pydantic import BaseModel
import asyncio
@function_tool
def execute_query(sql: str) -> str:
"""Execute a SQL query against the production database."""
forbidden = ["DROP", "DELETE", "TRUNCATE", "ALTER"]
if any(kw in sql.upper() for kw in forbidden):
return f"Query rejected: contains {kw}"
return "Query executed successfully"
def safety_check(input_data) -> GuardrailFunctionOutput:
"""Block queries containing destructive SQL keywords."""
text = input_data.input if hasattr(input_data, "input") else str(input_data)
dangerous = ["DROP TABLE", "DELETE FROM", "TRUNCATE"]
for pattern in dangerous:
if pattern.upper() in text.upper():
return GuardrailFunctionOutput(
output_info={"blocked_pattern": pattern},
tripwire_triggered=True
)
return GuardrailFunctionOutput(
output_info={"status": "clean"},
tripwire_triggered=False
)
sql_agent = Agent(
name="DB Query Agent",
instructions="Execute SQL queries against the database.",
tools=[execute_query],
input_guardrails=[InputGuardrail(guardrail_function=safety_check)]
)
async def main():
# This trips the guardrail
result = await Runner.run(sql_agent, "DROP TABLE users")
print(f"Guardrail triggered: {result.input_guardrail_results}")
asyncio.run(main())
Our take: the guardrail system is one of the SDK’s strongest features. Input guardrails on a fast/cheap model (like gpt-4o-mini) can filter malicious or off-topic queries before invoking the expensive main model, saving both cost and latency. This pattern—guard-and-gate—is production-ready and we recommend it for any customer-facing agent.
The SDK provides session management for maintaining state across turns. The base Session is abstract; implementations include SQLite, Redis, and encrypted storage.
from agents import Agent, Runner
from agents.extensions.memory import AsyncSQLiteSession
import asyncio
async def main():
session = AsyncSQLiteSession(
db_path="agent_memory.db",
agent_id="support-bot-v1",
user_id="user-42"
)
agent = Agent(
name="Memory Agent",
instructions="You recall previous conversations with this user."
)
# First interaction
result1 = await Runner.run(
agent,
"My name is Sarah and I prefer Python examples.",
session=session
)
# Second interaction - agent remembers
result2 = await Runner.run(
agent,
"What's my name and preferred language?",
session=session
)
print(result2.final_output)
asyncio.run(main())
For high-scale deployments, the RedisSession extension or custom session implementations are the right choice. The SDK’s session abstraction makes swapping backends trivial.
The April 15, 2026 update introduced sandbox agents—contained execution environments where agents can read files, run shell commands, generate artifacts, and persist state across runs. This was the single biggest capability gap the SDK had, and OpenAI filled it. (Help Net Security coverage of the update.)
Sandbox agents are currently in beta. Expect API changes before general availability.
A sandbox agent wraps a standard Agent with a Manifest (defining workspace contents), capabilities (filesystem, shell, memory, skills), and a sandbox client (Docker or local Unix).
from agents import SandboxAgent, Runner, DockerSandboxClient
from agents.sandbox import (
Manifest, WorkspaceFile, Capabilities,
FilesystemCapability, ShellCapability, MemoryCapability
)
import asyncio
async def main():
# Create a sandbox agent with a pre-loaded codebase
agent = SandboxAgent(
name="Code Review Agent",
default_manifest=Manifest(
workspace_files=[
WorkspaceFile(
path="src/main.py",
content="def hello():\n print('Hello World')\n"
)
]
),
default_capabilities=Capabilities(
filesystem=FilesystemCapability(allowed_directories=["/workspace"]),
shell=ShellCapability(allowed_commands=["python3", "grep", "cat"]),
memory=MemoryCapability()
),
sandbox_client=DockerSandboxClient()
)
result = await Runner.run(
agent,
"Read src/main.py and add error handling to the hello function."
)
print(result.final_output)
asyncio.run(main())
The significance of sandboxing extends beyond safety. Persistent workspace state means agents can resume interrupted tasks without re-exploring the filesystem—a pattern previously achievable only with Claude Code or Cursor. The sandbox client abstraction (Docker or Unix) also means infrastructure teams can standardize on container-based isolation for production deployments.
Two caveats from our testing: sandbox startup adds 2–4 seconds of latency per initialization, and the Docker sandbox client requires Docker socket access—fine for CI pipelines, problematic for multi-tenant SaaS unless you run a sandbox orchestrator.
One of the SDK’s more powerful patterns is using agents as tools. This isn’t a handoff—the parent agent calls the sub-agent as a tool, receives results back, and continues working. The sub-agent’s instructions and tools are invisible to the parent’s main loop.
from agents import Agent, Runner, function_tool
from agents.tool import function_tool as make_tool
import asyncio
# Sub-agent that acts as a tool
code_analyzer = Agent(
name="Code Analyzer",
instructions="Analyze Python code for bugs and suggest improvements.",
model="gpt-4o"
)
# Wrap the agent as a tool
async def analyze_code(code: str) -> str:
"""Analyze Python code and return a detailed review."""
result = await Runner.run(code_analyzer, f"Review this code:\n\n{code}")
return result.final_output
# Main agent with the sub-agent as a tool
reviewer = Agent(
name="Code Reviewer",
instructions="Review submitted code. Use the code analyzer for detailed analysis.",
tools=[make_tool(analyze_code)]
)
async def main():
result = await Runner.run(
reviewer,
"""Review this code:
def divide(a, b):
return a / b
"""
)
print(result.final_output)
asyncio.run(main())
This pattern excels when the sub-agent needs different model settings, specialized tools, or stricter guardrails than the parent. You can also enforce approval gates by wrapping the tool call with a human-in-the-loop checkpoint.
The SDK ships with built-in tracing. Every Runner.run() call generates a trace with spans for each LLM call, tool execution, and handoff transition.
from agents import Agent, Runner, set_default_openai_key
from agents.tracing import set_trace_processors, ConsoleSpanProcessor
import asyncio
# Enable console tracing
set_trace_processors([ConsoleSpanProcessor()])
agent = Agent(
name="Traced Agent",
instructions="Answer questions concisely."
)
async def main():
# Each run generates a trace ID
result = await Runner.run(agent, "What is MCP?")
# Traces are available in the console output
asyncio.run(main())
For production monitoring, the SDK’s processor interface lets you route traces to LangSmith, Langfuse, or any OpenTelemetry-compatible backend. The trace data includes model name, token counts, tool call arguments/results, and handoff targets—all essential for evaluating agent behavior across versions.
The SDK supports connecting to MCP servers as tool sources. This means your Agents SDK code can consume tools from any MCP-compliant server—databases, file systems, SaaS APIs—without custom integration code. For context on the MCP landscape, see our Model Context Protocol complete guide.
from agents import Agent, Runner, McpServer
from agents.mcp import StdioMcpClient
import asyncio
async def main():
# Connect to an MCP server
mcp_client = StdioMcpClient(command="python", args=["my_mcp_server.py"])
await mcp_client.connect()
agent = Agent(
name="MCP Agent",
instructions="Use the available tools to help the user.",
mcp_servers=[McpServer(client=mcp_client)]
)
result = await Runner.run(agent, "What's in my database?")
print(result.final_output)
asyncio.run(main())
This is architecturally significant: the Agents SDK sits alongside LangChain, CrewAI, and AutoGen as a consumer of the MCP tool standard. For teams that have already built MCP server infrastructure, the SDK becomes a thin orchestration layer on top of your existing tool ecosystem.
Run input guardrails on a fast model before invoking the primary agent. This pattern saves 60–80% of tokens on queries that are off-topic, malformed, or malicious.
from agents import Agent, Runner, InputGuardrail, GuardrailFunctionOutput
import asyncio
def topic_filter(input_data) -> GuardrailFunctionOutput:
"""Ensure the query relates to our domain."""
text = input_data.input.lower()
allowed_topics = ["billing", "shipping", "product", "support", "refund"]
if not any(topic in text for topic in allowed_topics):
return GuardrailFunctionOutput(
output_info={"reason": "out_of_scope"},
tripwire_triggered=True
)
return GuardrailFunctionOutput(
output_info={"status": "in_scope"},
tripwire_triggered=False
)
agent = Agent(
name="Customer Support",
instructions="Help with billing and shipping questions.",
input_guardrails=[
InputGuardrail(
guardrail_function=topic_filter,
model="gpt-4o-mini" # Cheap model for gate
)
]
)
Use input_type on handoffs to pass structured data between agents, eliminating the “summarize everything I know” anti-pattern.
from pydantic import BaseModel
from agents import Agent, handoff, AgentInput
class RefundRequest(BaseModel):
order_id: str
reason: str
amount: float
refund_agent = Agent(
name="Refund Agent",
instructions="Process the refund based on the provided details."
)
triage = Agent(
name="Triage",
instructions="Collect refund details and transfer to the refund agent.",
handoffs=[
handoff(
refund_agent,
input_type=AgentInput(input_type=RefundRequest)
)
]
)
Not every agent needs gpt-4o. Route classification tasks to gpt-4o-mini, complex reasoning to gpt-4o, and code tasks to o1 or o3 depending on the SDK version you’re running. The model_settings parameter on each Agent makes this straightforward without global configuration.
The SDK is the right choice when:
Consider alternatives when:
The OpenAI Agents SDK occupies a distinct space in the agent framework landscape. Where LangChain provides broad integration and LangGraph provides graph-level control, the Agents SDK optimizes for the developer experience of OpenAI-native agent construction with built-in tracing and guardrails. Where CrewAI models organizational teamwork, the SDK models delegation through handoffs—closer to how a real engineering manager assigns tasks than how a project management tool routes tickets.
The April 2026 sandbox update brought the SDK significantly closer to production readiness for autonomous agents. Sandboxed execution with persistent workspaces, combined with the existing guardrail system, means you can deploy an agent that reads files, runs commands, and generates output—all within defined boundaries—without building the execution harness yourself.
For infrastructure teams evaluating how these frameworks layer on top of the broader agent tooling ecosystem, our Multi-Agent Orchestration Infrastructure guide covers the server patterns, load balancing, and observability stacks that apply regardless of which framework you choose.
This post is part of our Framework Deep Dive series, exploring the architectures and patterns of major AI agent frameworks. Read: LangChain, AutoGen, CrewAI.
An in-depth exploration of CrewAI's role-based architecture, crew orchestration patterns, task delegation, and production best practices for building collaborative AI agent teams
An in-depth exploration of Microsoft's AutoGen framework, its conversation-based multi-agent architecture, team patterns, and production best practices
An in-depth exploration of LangChain's architecture, components, and best practices for building production-ready AI agents