Multi-Agent Systems with Agents SDK

Multi-agent workflows can be built in two different ways in OpenAI's Agents SDK. The first is agents-as-tools which follows an orchestrator-subagent pattern. The second is using handoffs which allow agents to pass control over to other agents. In this example, we'll build both types of multi-agent systems exploring agents-as-tools and handoffs.

python

!pip install -qU \
    "openai-agents==0.1.0" \
    "linkup-sdk==0.2.4"

First let's set our OPENAI_API_KEY which we'll be using throughout the example. You can get a key from the OpenAI Platform.

python

import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") or \
    getpass("OpenAI API Key: ")

Orchestrator-Subagent

We will build a multi-agent system structured with a orchestrator-subagent pattern. The orchestrator in such a system refers to an agent that controls which subagents are used and in which order, this orchestrator also handles all in / out communication with the users of a the system. The subagent is an agent that is built to handle a particular scenario or task. The subagent is triggered by the orchestrator and responds to the orchestrator when it is finished.

Sub Agents

We'll begin by defining our subagents. We will create three subagents, those are:

Web Search Subagent will have access to the Linkup web search tool.
Internal Docs Subagent will have access to some "internal" company documents.
Code Execution Subagent will be able to write and execute simple Python code for us.

Lets start with our first subagent!

Web Search Subagent

The web search subagent will take a user's query and use it to search the web. The agent will collect information from various sources and then merge that into a single text response that will be passed back to our orchestrator.

OpenAI's built-in web search is not great, so we'll use another web search API called LinkUp. This service does require an account, but you will receive more than enough free credits to follow the course.

We initialize our Linkup client using an API key like so:

python

from linkup import LinkupClient

os.environ["LINKUP_API_KEY"] = os.getenv("LINKUP_API_KEY") or \
    getpass("Enter your Linkup API Key: ")

linkup_client = LinkupClient()

We perform an async search like so:

python

response = await linkup_client.async_search(
    query="Latest world news",
    depth="standard",
    output_type="searchResults",
)
response

We can parse out the results like so:

python

for result in response.results[:3]:
    print(f"{result.name}\n{result.url}\n{result.content}\n\n")

Let's put together a @function_tool using Linkup:

python

from agents import function_tool
from datetime import datetime

@function_tool
async def search_web(query: str) -> str:
    """Use this tool to search the web for information.
    """
    response = await linkup_client.async_search(
        query=query,
        depth="standard",
        output_type="searchResults",
    )
    answer = f"Search results for '{query}' on {datetime.now().strftime('%Y-%m-%d')}\n\n"
    for result in response.results[:3]:
        answer += f"{result.name}\n{result.url}\n{result.content}\n\n"
    return answer

Now we define our Web Search Subagent:

python

from agents import Agent

web_search_agent = Agent(
    name="Web Search Agent",
    model="gpt-4.1-mini",
    instructions=(
        "You are a web search agent that can search the web for information. Once "
        "you have the required information, summarize it with cleanly formatted links "
        "sourcing each bit of information. Ensure you answer the question accurately "
        "and use markdown formatting."
    ),
    tools=[search_web],
)

We can talk directly to our subagent to confirm it works:

python

from IPython.display import Markdown, display
from agents import Runner

result = await Runner.run(
    starting_agent=web_search_agent,
    input="How is the weather in Tokyo?"
)
display(Markdown(result.final_output))

Great! Now let's move onto our next subagent.

Internal Docs Subagent

In many corporate environments, we will find that our agents will need access to internal information that cannot be found on the web. To do this we would typically build a Retrieval Augmented Generation (RAG) pipeline, which can often be as simple as adding a vector search tool to our agents.

To support a full vector search tool over internal docs we would need to work through various data processing and indexing steps. Now, that would add a lot of complexity to this example so we will create a "dummy" search tool for some fake internal docs.

Our docs will discuss revenue figures for our wildly successful AI and robotics company called Skynet - you can find the revenue report here.

python

import requests

res = requests.get(
    "https://raw.githubusercontent.com/aurelio-labs/agents-sdk-course/refs/heads/main/assets/skynet-fy25-q1.md"
)
skynet_docs = res.text

@function_tool
async def search_internal_docs(query: str) -> str:
    return skynet_docs

Now we define our Internal Docs Subagent:

python

internal_docs_agent = Agent(
    name="Internal Docs Agent",
    model="gpt-4.1-mini",
    instructions=(
        "You are an agent with access to internal company documents. User's will ask "
        "you questions about the company and you will use the provided internal docs "
        "to answer the question. Ensure you answer the question accurately and use "
        "markdown formatting."
    ),
    tools=[search_internal_docs],
)

Let's confirm it works:

python

result = await Runner.run(
    starting_agent=internal_docs_agent,
    input="What was our revenue in Q1 2025?"
)
display(Markdown(result.final_output))

Perfect! Now onto our final subagent.

Code Execution Subagent

Our code execution subagent will be able to execute code for us. We'll focus on executing code for simple calculations but it's entirely feasible for State-of-the-Art (SotA) LLMs to write far more complex code as many of us will be aware with the AI code editors becoming increasingly prominent.

To run generated code, we will use Python's exec method, making sure to run our code in an isolated environment by setting no global variables with namespace={}.

python

@function_tool
def execute_code(code: str) -> str:
    """Execute Python code and return the output. The output must
    be assigned to a variable called `result`.
    """
    display(Markdown(f"Code to execute:\n```python\n{code}\n```"))
    try:
        namespace = {}
        exec(code, namespace)
        return namespace['result']
    except Exception as e:
        return f"Error executing code: {e}"

Now lets define our Code Execution Subagent. We will use gpt-4.1 rather than gpt-4.1-mini to maximize performance during code writing tasks.

python

code_execution_agent = Agent(
    name="Code Execution Agent",
    model="gpt-4.1",
    instructions=(
        "You are an agent with access to a code execution environment. You will be "
        "given a question and you will need to write code to answer the question. "
        "Ensure you write the code in a way that is easy to understand and use."
    ),
    tools=[execute_code],
)

We can test our subagent with a simple math question:

python

result = await Runner.run(
    starting_agent=code_execution_agent,
    input=(
        "If I have four apples and I multiply them by seventy-one and one tenth "
        "bananas, how many do I have?"
    )
)
display(Markdown(result.final_output))

We now have all three subagents - it's time to create our orchestrator.

Orchestrator

Our orchestrator will control the input and output of information to our subagents in the same why that our subagents control the input and output of information to our tools. In reality, our subagents become tools in the orchestrator-subagent pattern. To turn agents into tools we call the as_tool method and provide a name and description for our agents-as-tools.

We will first define our instructions for the orchestrator, explaining it's role in our multi-agent system.

python

ORCHESTRATOR_PROMPT = (
    "You are the orchestrator of a multi-agent system. Your task is to take "
    "the user's query and pass it to the appropriate agent tool. The agent "
    "tools will see the input you provide and use it to get all of the "
    "information that you need to answer the user's query. You may need to "
    "call multiple agents to get all of the information you need. Do not "
    "mention or draw attention to the fact that this is a multi-agent system "
    "in your conversation with the user. Note that you are an assistant for "
    "the Skynet company, if the user asks about company information or "
    "finances, you should use our internal information rather than public "
    "information."
)

Now we define the orchestrator, including our subagents using the as_tool method — note that we can also add normal tools to our orchestrator.

python

from datetime import datetime

@function_tool
def get_current_date():
    """Use this tool to get the current date and time."""
    return datetime.now().strftime("%Y-%m-%d %H:%M:%S")

orchestrator = Agent(
    name="Orchestrator",
    model="gpt-4.1",
    instructions=ORCHESTRATOR_PROMPT,
    tools=[
        web_search_agent.as_tool(
            tool_name="web_search_agent",  # cannot include whitespace in tool name
            tool_description="Search the web for up-to-date information"
        ),
        internal_docs_agent.as_tool(
            tool_name="internal_docs_agent",
            tool_description="Search the internal docs for information"
        ),
        code_execution_agent.as_tool(
            tool_name="code_execution_agent",
            tool_description="Execute code to answer the question"
        ),
        get_current_date,
    ],
)

Let's test our agent with a few queries. Our first query will require our orchestrator to call multiple tools.

python

result = await Runner.run(
    starting_agent=orchestrator,
    input="How long ago from today was it when got our last revenue report?"
)
display(Markdown(result.final_output))

We should see in our traces dashboard on the OpenAI Platform that our agent used both internal_docs_agent and get_current_date tools to answer the question.

Let's ask another question:

python

result = await Runner.run(
    starting_agent=orchestrator,
    input=(
        "What is our current revenue, and what percentage of revenue comes from the "
        "T-1000 units?"
    )
)
display(Markdown(result.final_output))

Our orchestrator-subagent workflow is working well. Now we can move on to handoffs.

Handoffs

When we use handoffs in Agents SDK the agent is handing over control of the entire workflow to another agent. Handoffs differ to the orchestrator-subagent pattern, with orchestrator-subagent the orchestrator retains control as each subagent must ultimately respond to the orchestrator and the orchestrator decides the flow of information and generates the final response to the user. With handoffs, once a "subagent" gains control of the workflow the flow of information and final answer generation is under their control.

Using the handoff structure, any one of our agents may answer the user directly and our subagents get to see the entire chat history with the steps taken so far.

A significant positive here is latency. To answer a query that requires a single web-search with the orchestrator-subagent, we would need three generations:

text

[input] -> orchestrator -> web_search_subagent -> orchestrator -> [output]

The same query with the handoff structure requires just two generations (note, the orchestrator and main_agent are essentially the same):

text

[input] -> main_agent -> web_search_subagent -> [output]

Because we are using less LLM generations to produce our answer, we can generate an answer much more quickly as we skip the return trip through the orchestrator.

The handoff speed improvement is great but also results in a negative — our workflow can no longer handle queries that require multiple agents to answer. When deciding what structure to use for a particular use-case, the pros and cons of each structure will need to be considered.

Let's jump into implementing our handoff agents workflow.

Using Handoffs

There are three key things that we need to use when defining our main_agent (equivalent to our earlier orchestrator agent), those are:

Update our instructions prompt to make it clear what the handoffs are and how they should be used. OpenAI provides a default prompt prefix that we can use.
Set the handoffs parameter, which is a list of agents that we can use as handoffs.
Set the handoff_description parameter, this is an additional prompt where we should describe to the main_agent when it should use the handoffs.

First, let's check the preset prompt prefix:

python

from agents.extensions.handoff_prompt import RECOMMENDED_PROMPT_PREFIX

display(Markdown(RECOMMENDED_PROMPT_PREFIX))

Now let's define our main_agent:

python

main_agent = Agent(
    name="Main Agent",
    model="gpt-4.1",
    instructions=RECOMMENDED_PROMPT_PREFIX,
    handoffs=[web_search_agent, internal_docs_agent, code_execution_agent],
    handoff_description=(
        "Handoff to the relevant agent for queries where we need additional information "
        "from the web or internal docs, or when we need to make calculations."
    ),
    tools=[get_current_date],
)

We'll run the same queries as before and see how the response time differs.

python

result = await Runner.run(
    starting_agent=main_agent,
    input="How long ago from today was it when got our last revenue report?"
)
display(Markdown(result.final_output))

That's correct, we also got a 6.4s runtime vs. the orchestrator-subagent runtime of 7.5s for the same query. Let's try another:

python

result = await Runner.run(
    starting_agent=main_agent,
    input=(
        "What is our current revenue, and what percentage of revenue comes from the "
        "T-1000 units?"
    )
)
display(Markdown(result.final_output))

The answer is correct again, and we get a runtime of 7.6s vs. the orchestrator-subagent runtime of 8.6s, another notable improvement to latency.

Other Handoff Features

There are a few other handoff-specific features that we can use, these can be used for various things but are particularly useful during development and debugging of multi-agent workflows. These features are:

on_handoff is a callback executed whenever a handoff occurs. It could be used in a production setting to maintain a record of handoffs in a DB or used in telemetry. In development this can be a handy place to add print or logger.debug statements.
input_type allows us to define a specific structured input format for generated information that will be passed to our handoff agents.
input_filter allows us to restrict the information being passed through to our handoff agents.

We can set all of these via a handoff object, which wraps around our handoff agents and which we then provide via the Agent(handoffs=...) parameter. Let's start with the on_handoff parameter:

python

from agents import RunContextWrapper, handoff

# we define a function that will be called when the handoff is made
async def on_handoff(ctx: RunContextWrapper[None]):
    print("Handoff called")

# we then pass this function to the handoff object
web_search_handoff = handoff(agent=web_search_agent, on_handoff=on_handoff)
internal_docs_handoff = handoff(agent=internal_docs_agent, on_handoff=on_handoff)
code_execution_handoff = handoff(agent=code_execution_agent, on_handoff=on_handoff)

# and initialize the main_agent
main_agent = Agent(
    name="Main Agent",
    model="gpt-4.1",
    instructions=RECOMMENDED_PROMPT_PREFIX,
    handoffs=[web_search_handoff, internal_docs_handoff, code_execution_handoff],
    tools=[get_current_date],
)

Now let's see what happens when querying the main_agent:

python

result = await Runner.run(
    starting_agent=main_agent,
    input="How long ago from today was it when got our last revenue report?"
)
display(Markdown(result.final_output))

Now we can see if and when the handoff occurs. However, we don't get much information other than that the handoff occured. Fortunately, we can use the input_type parameter to provide more information. We will define a pydantic BaseModel with the information that we'd like to include.

python

from pydantic import BaseModel, Field

class HandoffInfo(BaseModel):
    subagent_name: str = Field(description="The name of the subagent being called.")
    reason: str = Field(description="The reason for the handoff.")

# we redefine the on_handoff to include the HandoffInfo
async def on_handoff(ctx: RunContextWrapper[None], input_data: HandoffInfo):
    print(f"Handoff to '{input_data.subagent_name}' because '{input_data.reason}'")

# now redefine the handoff objects with the input_type parameter
web_search_handoff = handoff(
    agent=web_search_agent,
    on_handoff=on_handoff,
    input_type=HandoffInfo
)
internal_docs_handoff = handoff(
    agent=internal_docs_agent,
    on_handoff=on_handoff,
    input_type=HandoffInfo
)
code_execution_handoff = handoff(
    agent=code_execution_agent,
    on_handoff=on_handoff,
    input_type=HandoffInfo
)

# and initialize the main_agent
main_agent = Agent(
    name="Main Agent",
    model="gpt-4.1",
    instructions=RECOMMENDED_PROMPT_PREFIX,
    handoffs=[web_search_handoff, internal_docs_handoff, code_execution_handoff],
    tools=[get_current_date],
)

Now call the main_agent:

python

result = await Runner.run(
    starting_agent=main_agent,
    input="How long ago from today was it when got our last revenue report?"
)
display(Markdown(result.final_output))

We're now seeing much more information. The final handoff feature we will test is the handoff_filters. The filters work by removing information sent to the handoff agents. By default, all information seen by the previous agent will be seen by the new handoff agent. That includes all chat history message and all tool calls made so far.

In some cases we may want to filter this information. For example, with a weaker LLM, too much information can reduce it's performance so it is often a good idea to only share the information that is absolutely necessary.

If we have various tool calls in our chat history, these may confuse a smaller LLM. In this scenario, we can filter all tool calls from the history using the handoff_filters.remove_all_tools method:

python

from agents.extensions import handoff_filters

# now redefine the handoff objects with the input_type parameter
web_search_handoff = handoff(
    agent=web_search_agent,
    on_handoff=on_handoff,
    input_type=HandoffInfo,
    input_filter=handoff_filters.remove_all_tools
)
internal_docs_handoff = handoff(
    agent=internal_docs_agent,
    on_handoff=on_handoff,
    input_type=HandoffInfo,
    input_filter=handoff_filters.remove_all_tools
)
code_execution_handoff = handoff(
    agent=code_execution_agent,
    on_handoff=on_handoff,
    input_type=HandoffInfo,
    input_filter=handoff_filters.remove_all_tools
)

# and initialize the main_agent
main_agent = Agent(
    name="Main Agent",
    model="gpt-4.1",
    instructions=RECOMMENDED_PROMPT_PREFIX,
    handoffs=[web_search_handoff, internal_docs_handoff, code_execution_handoff],
    tools=[get_current_date],
)

Now when asking for the time difference we will see that our handoff agent is unable to give us an accurate answer. This is because the current time is first found by our main_agent via the get_current_date tool and that information is stored in the chat history. When we filter tool calls out of the chat history this information is lost.

python

result = await Runner.run(
    starting_agent=main_agent,
    input="How long ago from today was it when got our last revenue report?"
)
display(Markdown(result.final_output))

We should see an incorrect date above.

That's it for this deep dive into multi-agent systems with OpenAI's Agents SDK. We've covered a broad range of multi-agent features in the SDK and how we can use them to build orchestrator-subagent workflows, or handoff workflows. Both having their own pros and cons.