In this chapter, we will continue from the introduction to agents and dive deeper into agents. Learning how to build our custom agent execution loop for v0.3 of LangChain.
What is the Agent Executor?
When we talk about agents, a significant part of an "agent" is simple code logic, iteratively rerunning LLM calls and processing their output. The exact logic varies significantly, but one well-known example is the ReAct agent.
Reason + Action (ReAct) agents use iterative reasoning and action steps to incorporate chain-of-thought and tool-use into their execution. During the reasoning step, the LLM generates the steps to take to answer the query. Next, the LLM generates the action input, which our code logic parses into a tool call.
Following our action step, we get an observation from the tool call. Then, we feed the observation back into the agent executor logic for a final answer or further reasoning and action steps.
The agent and agent executor we will be building will follow this pattern.
Creating an Agent
We will construct the agent using LangChain Epression Language (LCEL). We cover LCEL more in the LCEL chapter, but as before, all we need to know now is that we construct our agent using syntax and components like so:
agent = (
<input parameters, including chat history and user query>
| <prompt>
| <LLM with tools>
)
We need this agent to remember previous interactions within the conversation. To do
that, we will use the ChatPromptTemplate
with a system message, a placeholder for our
chat history, a placeholder for the user query, and a placeholder for the agent
scratchpad.
The agent scratchpad is where the agent writes its notes as it works through multiple
internal thought and tool-use steps to produce a final output for the user. This scratchpad is a list of messages with alternating roles of ai
(for the tool call) and tool
(for the tool execution output). Both message types require a tool_call_id
field which is used to link the respective AI and tool messages - this can be required when we many tool calls happening in parallel.
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
prompt = ChatPromptTemplate.from_messages([
("system", (
"You're a helpful assistant. When answering a user's question "
"you should first use one of the tools provided. After using a "
"tool the tool output will be provided in the "
"'scratchpad' below. If you have an answer in the "
"scratchpad you should not use any more tools and "
"instead answer directly to the user."
)),
MessagesPlaceholder(variable_name="chat_history"),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
])
Next, we must define our LLM. We will use the gpt-4o-mini
model with a temperature of
0.0
.
import os
from getpass import getpass
from langchain_openai import ChatOpenAI
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") \
or getpass("Enter your OpenAI API key: ")
llm = ChatOpenAI(
model_name="gpt-4o-mini",
temperature=0.0,
)
To add tools to our LLM, we will use the bind_tools
method within the LCEL
constructor, which will take and add our tools to the LLM. We'll also include the
tool_choice="any"
argument to bind_tools
, which tells the LLM that it MUST use a
tool, ie it cannot provide a final answer directly (therefore not using a tool):
from langchain_core.runnables.base import RunnableSerializable
tools = [add, subtract, multiply, exponentiate]
# define the agent runnable
agent: RunnableSerializable = (
{
"input": lambda x: x["input"],
"chat_history": lambda x: x["chat_history"],
"agent_scratchpad": lambda x: x.get("agent_scratchpad", [])
}
| prompt
| llm.bind_tools(tools, tool_choice="any")
)
We invoke the agent with the invoke
method, passing in the input and chat history.
tool_call = agent.invoke({"input": "What is 10 + 10", "chat_history": []})
tool_call
AIMessage(
content='',
additional_kwargs={
'tool_calls': [
{
'function': {
'arguments': '{"x":10,"y":10}',
'name': 'add'
},
'id': 'call_bI8aZpMN1y907LncsX9rhY6y',
'type': 'function'
}
],
'refusal': None
},
response_metadata={
'token_usage': {
'completion_tokens': 18,
'prompt_tokens': 205,
'total_tokens': 223,
'completion_tokens_details': {
'accepted_prediction_tokens': 0,
'audio_tokens': 0,
'reasoning_tokens': 0,
'rejected_prediction_tokens': 0
},
'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}
}
}
)
Because we set tool_choice="any"
to force the tool output, the usual content
field
will be empty as LangChain reserves that field for natural language output, i.e. the
final answer of the LLM. To find our tool output, we need to look at the tool_calls
field:
tool_call.tool_calls
[{'name': 'add',
'args': {'x': 10, 'y': 10},
'id': 'call_bI8aZpMN1y907LncsX9rhY6y',
'type': 'tool_call'}]
From here, we have the tool name
that our LLM wants to use and the args
that it
wants to pass to that tool. We can see that the tool add
is being used with the
arguments x=10
and y=10
. The agent.invoke
method has not executed the tool
function; we need to write that part of the agent code ourselves.
Executing the tool code requires two steps:
-
Map the tool
name
to the tool function. -
Execute the tool function with the generated
args
.
# create tool name to function mapping
name2tool = {tool.name: tool.func for tool in tools}
Now execute to get our answer:
tool_exec_content = name2tool[tool_call.tool_calls[0]["name"]](
**tool_call.tool_calls[0]["args"]
)
tool_exec_content
20
That is our answer and tool execution logic. We feed this back into our LLM via the
agent_scratchpad
placeholder.
from langchain_core.messages import ToolMessage
tool_exec = ToolMessage(
content=f"The {tool_call.tool_calls[0]['name']} tool returned {tool_exec_content}",
tool_call_id=tool_call.tool_calls[0]["id"]
)
out = agent.invoke({
"input": "What is 10 + 10",
"chat_history": [],
"agent_scratchpad": [tool_call, tool_exec]
})
out
AIMessage(
content='',
additional_kwargs={
'tool_calls': [
{
'function': {
'arguments': '{"x":10,"y":10}',
'name': 'add'
},
'id': 'call_vIKn0eWVupXsSpJBT1budTHr',
'type': 'function'
}
],
'refusal': None
},
response_metadata={
'token_usage': {
'completion_tokens': 18,
'prompt_tokens': 210,
'total_tokens': 228,
'completion_tokens_details': {
'accepted_prediction_tokens': 0, 'audio_tokens': 0,
'reasoning_tokens': 0, 'rejected_prediction_tokens': 0
},
'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}
}
}
)
Despite having the answer in our agent_scratchpad
, the LLM still tries to use the tool
again. This behaviour happens because we bonded the tools to the LLM with
tool_choice="any"
. When we set tool_choice
to "any"
or "required"
, we tell the
LLM that it MUST use a tool, i.e., it cannot provide a final answer.
There's two options to fix this:
-
Set
tool_choice="auto"
to tell the LLM that it can choose to use a tool or provide a final answer. -
Create a
final_answer
tool - we'll explain this shortly.
First, let's try option 1:
agent: RunnableSerializable = (
{
"input": lambda x: x["input"],
"chat_history": lambda x: x["chat_history"],
"agent_scratchpad": lambda x: x.get("agent_scratchpad", [])
}
| prompt
| llm.bind_tools(tools, tool_choice="auto")
)
We'll start from the start again, so agent_scratchpad
is empty:
tool_call = agent.invoke({"input": "What is 10 + 10", "chat_history": []})
tool_call
AIMessage(
content='',
additional_kwargs={
'tool_calls': [
{
'function': {
'arguments': '{"x":10,"y":10}',
'name': 'add'
},
'id': 'call_YOCTOCe2iHyIJhcfaiDVafpA',
'type': 'function'
}
],
'refusal': None
},
response_metadata={
'token_usage': {
'completion_tokens': 18,
'prompt_tokens': 205,
'total_tokens': 223,
'completion_tokens_details': {
'accepted_prediction_tokens': 0, 'audio_tokens': 0,
'reasoning_tokens': 0, 'rejected_prediction_tokens': 0
},
'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}
}
}
)
Now we execute the tool and pass it's output into the agent_scratchpad
placeholder:
tool_output = name2tool[tool_call.tool_calls[0]["name"]](
**tool_call.tool_calls[0]["args"]
)
tool_exec = ToolMessage(
content=f"The {tool_call.tool_calls[0]['name']} tool returned {tool_output}",
tool_call_id=tool_call.tool_calls[0]["id"]
)
out = agent.invoke({
"input": "What is 10 + 10",
"chat_history": [],
"agent_scratchpad": [tool_call, tool_exec]
})
out
AIMessage(
content='10 + 10 equals 20.',
additional_kwargs={'refusal': None},
response_metadata={
'token_usage': {
'completion_tokens': 10,
'prompt_tokens': 210,
'total_tokens': 220,
'completion_tokens_details': {
'accepted_prediction_tokens': 0, 'audio_tokens': 0,
'reasoning_tokens': 0, 'rejected_prediction_tokens': 0
},
'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}
}
}
)
We now have the final answer in the content
field! This method is perfectly
functional; however, we recommend option 2 as it provides more control over the
agent's output.
There are several reasons that option 2 can provide more control, those are:
-
It removes the possibility of an agent using the direct
content
field when it is not appropriate; for example, some LLMs (particularly smaller ones) may try to use thecontent
field when using a tool. -
We can enforce a specific structured output in our answers. Structured outputs are handy when we require particular fields for downstream code or multi-part answers. For example, a RAG agent may return a natural language answer and a list of sources used to generate that answer.
To implement option 2, we must create a final_answer
tool. We will add a
tools_used
field to give our output some structure—in a real-world use case, we
probably wouldn't want to generate this field, but it's useful for our example here.
@tool
def final_answer(answer: str, tools_used: list[str]) -> str:
"""Use this tool to provide a final answer to the user.
The answer should be in natural language as this will be provided
to the user directly. The tools_used must include a list of tool
names that were used within the `scratchpad`.
"""
return {"answer": answer, "tools_used": tools_used}
Our final_answer
tool doesn't necessarily need to do anything; in this example,
we're using it purely to structure our final response. We can now add this tool to our
agent:
tools = [final_answer, add, subtract, multiply, exponentiate]
# we need to update our name2tool mapping too
name2tool = {tool.name: tool.func for tool in tools}
agent: RunnableSerializable = (
{
"input": lambda x: x["input"],
"chat_history": lambda x: x["chat_history"],
"agent_scratchpad": lambda x: x.get("agent_scratchpad", [])
}
| prompt
| llm.bind_tools(tools, tool_choice="any") # we're forcing tool use again
)
Now we invoke:
out = agent.invoke({"input": "What is 10 + 10", "chat_history": []})
out.tool_calls
[{'name': 'add',
'args': {'x': 10, 'y': 10},
'id': 'call_fhhm33BCyJdxlyguAuP9STEK',
'type': 'tool_call'}]
We execute the tool and provide it's output to the agent again:
tool_out = name2tool[tool_call.tool_calls[0]["name"]](
**tool_call.tool_calls[0]["args"]
)
tool_exec = ToolMessage(
content=f"The {tool_call.tool_calls[0]['name']} tool returned {tool_out}",
tool_call_id=tool_call.tool_calls[0]["id"]
)
out = agent.invoke({
"input": "What is 10 + 10",
"chat_history": [],
"agent_scratchpad": [tool_call, tool_exec]
})
out
AIMessage(
content='',
additional_kwargs={
'tool_calls': [
{
'function': {
'name': 'final_answer',
'arguments': '{"answer":"10 + 10 equals 20.","tools_used":["functions.add"]}'
},
'id': 'call_reBCXwxUOIePCItSSEuTKGCn',
'type': 'function'
}
],
'refusal': None
},
response_metadata={
'token_usage': {
'completion_tokens': 28,
'prompt_tokens': 282,
'total_tokens': 310,
'completion_tokens_details': {
'accepted_prediction_tokens': 0,
'audio_tokens': 0,
'reasoning_tokens': 0,
'rejected_prediction_tokens': 0
},
'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}
}
}
)
We see that content
remains empty because we force tool use. But we now have the
final_answer
tool, which the agent executor passes via the tool_calls
field:
out.tool_calls
[
{
'name': 'final_answer',
'args': {
'answer': '10 + 10 equals 20.',
'tools_used': ['functions.add']
},
'id': 'call_reBCXwxUOIePCItSSEuTKGCn',
'type': 'tool_call'
}
]
Because we see the final_answer
tool here, we don't pass this back into our agent, and
instead, this tells us to stop execution and pass the args
output onto our downstream
process or user directly:
out.tool_calls[0]["args"]
{'answer': '10 + 10 equals 20.', 'tools_used': ['functions.add']}
Building a Custom Agent Execution Loop
We've worked through each step of our agent code, but it doesn't run without us running every step. We must write a class to handle all the logic we just worked through.
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
class CustomAgentExecutor:
chat_history: list[BaseMessage]
def __init__(self, max_iterations: int = 3):
self.chat_history = []
self.max_iterations = max_iterations
self.agent: RunnableSerializable = (
{
"input": lambda x: x["input"],
"chat_history": lambda x: x["chat_history"],
"agent_scratchpad": lambda x: x.get("agent_scratchpad", [])
}
| prompt
| llm.bind_tools(tools, tool_choice="any") # we're forcing tool use again
)
def invoke(self, input: str) -> dict:
# invoke the agent but we do this iteratively in a loop until
# reaching a final answer
count = 0
agent_scratchpad = []
while count < self.max_iterations:
# invoke a step for the agent to generate a tool call
tool_call = self.agent.invoke({
"input": input,
"chat_history": self.chat_history,
"agent_scratchpad": agent_scratchpad
})
# add initial tool call to scratchpad
agent_scratchpad.append(tool_call)
# otherwise we execute the tool and add it's output to the agent scratchpad
tool_name = tool_call.tool_calls[0]["name"]
tool_args = tool_call.tool_calls[0]["args"]
tool_call_id = tool_call.tool_calls[0]["id"]
tool_out = name2tool[tool_name](**tool_args)
# add the tool output to the agent scratchpad
tool_exec = ToolMessage(
content=f"{tool_out}",
tool_call_id=tool_call_id
)
agent_scratchpad.append(tool_exec)
# add a print so we can see intermediate steps
print(f"{count}: {tool_name}({tool_args})")
count += 1
# if the tool call is the final answer tool, we stop
if tool_name == "final_answer":
break
# add the final output to the chat history
final_answer = tool_out["answer"]
self.chat_history.extend([
HumanMessage(content=input),
AIMessage(content=final_answer)
])
# return the final answer in dict form
return json.dumps(tool_out)
Now initialize the agent executor:
agent_executor = CustomAgentExecutor()
And test the invoke
method:
agent_executor.invoke(input="What is 10 + 10")
0: The add tool returned 20
1: final_answer({'answer': '10 + 10 equals 20.', 'tools_used': ['functions.add']})
{'answer': '10 + 10 equals 20.', 'tools_used': ['functions.add']}
We then get our answer and the tools that were used — all through our custom agent executor.