LangChain Agent Executor Deep Dive

In this chapter, we will continue from the introduction to agents and dive deeper into agents. Learning how to build our custom agent execution loop for v0.3 of LangChain.

What is the Agent Executor?

When we talk about agents, a significant part of an "agent" is simple code logic, iteratively rerunning LLM calls and processing their output. The exact logic varies significantly, but one well-known example is the ReAct agent.

ReAct process

Reason + Action (ReAct) agents use iterative reasoning and action steps to incorporate chain-of-thought and tool-use into their execution. During the reasoning step, the LLM generates the steps to take to answer the query. Next, the LLM generates the action input, which our code logic parses into a tool call.

Agentic graph of ReAct

Following our action step, we get an observation from the tool call. Then, we feed the observation back into the agent executor logic for a final answer or further reasoning and action steps.

The agent and agent executor we will be building will follow this pattern.

Creating an Agent

We will construct the agent using LangChain Epression Language (LCEL). We cover LCEL more in the LCEL chapter, but as before, all we need to know now is that we construct our agent using syntax and components like so:

text

agent = (
	<input parameters, including chat history and user query>
	| <prompt>
	| <LLM with tools>
)

We need this agent to remember previous interactions within the conversation. To do that, we will use the ChatPromptTemplate with a system message, a placeholder for our chat history, a placeholder for the user query, and a placeholder for the agent scratchpad.

The agent scratchpad is where the agent writes its notes as it works through multiple internal thought and tool-use steps to produce a final output for the user. This scratchpad is a list of messages with alternating roles of ai (for the tool call) and tool (for the tool execution output). Both message types require a tool_call_id field which is used to link the respective AI and tool messages - this can be required when we many tool calls happening in parallel.

python

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages([
    ("system", (
        "You're a helpful assistant. When answering a user's question "
        "you should first use one of the tools provided. After using a "
        "tool the tool output will be provided in the "
        "'scratchpad' below. If you have an answer in the "
        "scratchpad you should not use any more tools and "
        "instead answer directly to the user."
    )),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

Next, we must define our LLM. We will use the gpt-4o-mini model with a temperature of 0.0.

python

import os
from getpass import getpass
from langchain_openai import ChatOpenAI

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") \
    or getpass("Enter your OpenAI API key: ")

llm = ChatOpenAI(
    model_name="gpt-4o-mini",
    temperature=0.0,
)

To add tools to our LLM, we will use the bind_tools method within the LCEL constructor, which will take and add our tools to the LLM. We'll also include the tool_choice="any" argument to bind_tools, which tells the LLM that it MUST use a tool, ie it cannot provide a final answer directly (therefore not using a tool):

python

from langchain_core.runnables.base import RunnableSerializable

tools = [add, subtract, multiply, exponentiate]

# define the agent runnable
agent: RunnableSerializable = (
    {
        "input": lambda x: x["input"],
        "chat_history": lambda x: x["chat_history"],
        "agent_scratchpad": lambda x: x.get("agent_scratchpad", [])
    }
    | prompt
    | llm.bind_tools(tools, tool_choice="any")
)

We invoke the agent with the invoke method, passing in the input and chat history.

python

tool_call = agent.invoke({"input": "What is 10 + 10", "chat_history": []})
tool_call

python

AIMessage(
    content='',
    additional_kwargs={
        'tool_calls': [
            {
                'function': {
                    'arguments': '{"x":10,"y":10}',
                    'name': 'add'
                },
                'id': 'call_bI8aZpMN1y907LncsX9rhY6y',
                'type': 'function'
            }
        ],
        'refusal': None
    },
    response_metadata={
        'token_usage': {
            'completion_tokens': 18,
            'prompt_tokens': 205,
            'total_tokens': 223,
            'completion_tokens_details': {
                'accepted_prediction_tokens': 0,
                'audio_tokens': 0,
                'reasoning_tokens': 0,
                'rejected_prediction_tokens': 0
            },
            'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}
        }
    }
)

Because we set tool_choice="any" to force the tool output, the usual content field will be empty as LangChain reserves that field for natural language output, i.e. the final answer of the LLM. To find our tool output, we need to look at the tool_calls field:

python

tool_call.tool_calls

python

[{'name': 'add',
  'args': {'x': 10, 'y': 10},
  'id': 'call_bI8aZpMN1y907LncsX9rhY6y',
  'type': 'tool_call'}]

From here, we have the tool name that our LLM wants to use and the args that it wants to pass to that tool. We can see that the tool add is being used with the arguments x=10 and y=10. The agent.invoke method has not executed the tool function; we need to write that part of the agent code ourselves.

Executing the tool code requires two steps:

Map the tool name to the tool function.
Execute the tool function with the generated args.

python

# create tool name to function mapping
name2tool = {tool.name: tool.func for tool in tools}

Now execute to get our answer:

python

tool_exec_content = name2tool[tool_call.tool_calls[0]["name"]](
    **tool_call.tool_calls[0]["args"]
)
tool_exec_content

text

That is our answer and tool execution logic. We feed this back into our LLM via the agent_scratchpad placeholder.

python

from langchain_core.messages import ToolMessage

tool_exec = ToolMessage(
    content=f"The {tool_call.tool_calls[0]['name']} tool returned {tool_exec_content}",
    tool_call_id=tool_call.tool_calls[0]["id"]
)

out = agent.invoke({
    "input": "What is 10 + 10",
    "chat_history": [],
    "agent_scratchpad": [tool_call, tool_exec]
})
out

python

AIMessage(
    content='',
    additional_kwargs={
        'tool_calls': [
            {
                'function': {
                    'arguments': '{"x":10,"y":10}',
                    'name': 'add'
                },
                'id': 'call_vIKn0eWVupXsSpJBT1budTHr',
                'type': 'function'
            }
        ],
        'refusal': None
    },
    response_metadata={
        'token_usage': {
            'completion_tokens': 18,
            'prompt_tokens': 210,
            'total_tokens': 228,
            'completion_tokens_details': {
                'accepted_prediction_tokens': 0, 'audio_tokens': 0,
                'reasoning_tokens': 0, 'rejected_prediction_tokens': 0
            },
            'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}
        }
    }
)

Despite having the answer in our agent_scratchpad, the LLM still tries to use the tool again. This behaviour happens because we bonded the tools to the LLM with tool_choice="any". When we set tool_choice to "any" or "required", we tell the LLM that it MUST use a tool, i.e., it cannot provide a final answer.

There's two options to fix this:

Set tool_choice="auto" to tell the LLM that it can choose to use a tool or provide a final answer.
Create a final_answer tool - we'll explain this shortly.

First, let's try option 1:

python

agent: RunnableSerializable = (
    {
        "input": lambda x: x["input"],
        "chat_history": lambda x: x["chat_history"],
        "agent_scratchpad": lambda x: x.get("agent_scratchpad", [])
    }
    | prompt
    | llm.bind_tools(tools, tool_choice="auto")
)

We'll start from the start again, so agent_scratchpad is empty:

python

tool_call = agent.invoke({"input": "What is 10 + 10", "chat_history": []})
tool_call

python

AIMessage(
    content='',
    additional_kwargs={
        'tool_calls': [
            {
                'function': {
                    'arguments': '{"x":10,"y":10}',
                    'name': 'add'
                },
                'id': 'call_YOCTOCe2iHyIJhcfaiDVafpA',
                'type': 'function'
            }
        ],
        'refusal': None
    },
    response_metadata={
        'token_usage': {
            'completion_tokens': 18,
            'prompt_tokens': 205,
            'total_tokens': 223,
            'completion_tokens_details': {
                'accepted_prediction_tokens': 0, 'audio_tokens': 0,
                'reasoning_tokens': 0, 'rejected_prediction_tokens': 0
            },
            'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}
        }
    }
)

Now we execute the tool and pass it's output into the agent_scratchpad placeholder:

python

tool_output = name2tool[tool_call.tool_calls[0]["name"]](
    **tool_call.tool_calls[0]["args"]
)

tool_exec = ToolMessage(
    content=f"The {tool_call.tool_calls[0]['name']} tool returned {tool_output}",
    tool_call_id=tool_call.tool_calls[0]["id"]
)

out = agent.invoke({
    "input": "What is 10 + 10",
    "chat_history": [],
    "agent_scratchpad": [tool_call, tool_exec]
})
out

python

AIMessage(
    content='10 + 10 equals 20.',
    additional_kwargs={'refusal': None},
    response_metadata={
        'token_usage': {
            'completion_tokens': 10,
            'prompt_tokens': 210,
            'total_tokens': 220,
            'completion_tokens_details': {
                'accepted_prediction_tokens': 0, 'audio_tokens': 0,
                'reasoning_tokens': 0, 'rejected_prediction_tokens': 0
            },
            'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}
        }
    }
)

We now have the final answer in the content field! This method is perfectly functional; however, we recommend option 2 as it provides more control over the agent's output.

There are several reasons that option 2 can provide more control, those are:

It removes the possibility of an agent using the direct content field when it is not appropriate; for example, some LLMs (particularly smaller ones) may try to use the content field when using a tool.
We can enforce a specific structured output in our answers. Structured outputs are handy when we require particular fields for downstream code or multi-part answers. For example, a RAG agent may return a natural language answer and a list of sources used to generate that answer.

To implement option 2, we must create a final_answer tool. We will add a tools_used field to give our output some structure—in a real-world use case, we probably wouldn't want to generate this field, but it's useful for our example here.

python

@tool
def final_answer(answer: str, tools_used: list[str]) -> str:
    """Use this tool to provide a final answer to the user.
    The answer should be in natural language as this will be provided
    to the user directly. The tools_used must include a list of tool
    names that were used within the `scratchpad`.
    """
    return {"answer": answer, "tools_used": tools_used}

Our final_answer tool doesn't necessarily need to do anything; in this example, we're using it purely to structure our final response. We can now add this tool to our agent:

python

tools = [final_answer, add, subtract, multiply, exponentiate]

# we need to update our name2tool mapping too
name2tool = {tool.name: tool.func for tool in tools}

agent: RunnableSerializable = (
    {
        "input": lambda x: x["input"],
        "chat_history": lambda x: x["chat_history"],
        "agent_scratchpad": lambda x: x.get("agent_scratchpad", [])
    }
    | prompt
    | llm.bind_tools(tools, tool_choice="any")  # we're forcing tool use again
)

Now we invoke:

python

out = agent.invoke({"input": "What is 10 + 10", "chat_history": []})
out.tool_calls

python

[{'name': 'add',
  'args': {'x': 10, 'y': 10},
  'id': 'call_fhhm33BCyJdxlyguAuP9STEK',
  'type': 'tool_call'}]

We execute the tool and provide it's output to the agent again:

python

tool_out = name2tool[tool_call.tool_calls[0]["name"]](
    **tool_call.tool_calls[0]["args"]
)

tool_exec = ToolMessage(
    content=f"The {tool_call.tool_calls[0]['name']} tool returned {tool_out}",
    tool_call_id=tool_call.tool_calls[0]["id"]
)

out = agent.invoke({
    "input": "What is 10 + 10",
    "chat_history": [],
    "agent_scratchpad": [tool_call, tool_exec]
})
out

python

AIMessage(
    content='',
    additional_kwargs={
        'tool_calls': [
            {
                'function': {
                    'name': 'final_answer',
                    'arguments': '{"answer":"10 + 10 equals 20.","tools_used":["functions.add"]}'
                },
                'id': 'call_reBCXwxUOIePCItSSEuTKGCn',
                'type': 'function'
            }
        ],
        'refusal': None
    },
    response_metadata={
        'token_usage': {
            'completion_tokens': 28,
            'prompt_tokens': 282,
            'total_tokens': 310,
            'completion_tokens_details': {
                'accepted_prediction_tokens': 0,
                'audio_tokens': 0,
                'reasoning_tokens': 0,
                'rejected_prediction_tokens': 0
            },
            'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}
        }
    }
)

We see that content remains empty because we force tool use. But we now have the final_answer tool, which the agent executor passes via the tool_calls field:

python

out.tool_calls

python

[
	{
		'name': 'final_answer',
		'args': {
			'answer': '10 + 10 equals 20.',
			'tools_used': ['functions.add']
		},
		'id': 'call_reBCXwxUOIePCItSSEuTKGCn',
		'type': 'tool_call'
	}
]

Because we see the final_answer tool here, we don't pass this back into our agent, and instead, this tells us to stop execution and pass the args output onto our downstream process or user directly:

python

out.tool_calls[0]["args"]

python

{'answer': '10 + 10 equals 20.', 'tools_used': ['functions.add']}

Building a Custom Agent Execution Loop

We've worked through each step of our agent code, but it doesn't run without us running every step. We must write a class to handle all the logic we just worked through.

python

from langchain_core.messages import BaseMessage, HumanMessage, AIMessage


class CustomAgentExecutor:
    chat_history: list[BaseMessage]

    def __init__(self, max_iterations: int = 3):
        self.chat_history = []
        self.max_iterations = max_iterations
        self.agent: RunnableSerializable = (
            {
                "input": lambda x: x["input"],
                "chat_history": lambda x: x["chat_history"],
                "agent_scratchpad": lambda x: x.get("agent_scratchpad", [])
            }
            | prompt
            | llm.bind_tools(tools, tool_choice="any")  # we're forcing tool use again
        )

    def invoke(self, input: str) -> dict:
        # invoke the agent but we do this iteratively in a loop until
        # reaching a final answer
        count = 0
        agent_scratchpad = []
        while count < self.max_iterations:
            # invoke a step for the agent to generate a tool call
            tool_call = self.agent.invoke({
                "input": input,
                "chat_history": self.chat_history,
                "agent_scratchpad": agent_scratchpad
            })
            # add initial tool call to scratchpad
            agent_scratchpad.append(tool_call)
            # otherwise we execute the tool and add it's output to the agent scratchpad
            tool_name = tool_call.tool_calls[0]["name"]
            tool_args = tool_call.tool_calls[0]["args"]
            tool_call_id = tool_call.tool_calls[0]["id"]
            tool_out = name2tool[tool_name](**tool_args)
            # add the tool output to the agent scratchpad
            tool_exec = ToolMessage(
                content=f"{tool_out}",
                tool_call_id=tool_call_id
            )
            agent_scratchpad.append(tool_exec)
            # add a print so we can see intermediate steps
            print(f"{count}: {tool_name}({tool_args})")
            count += 1
            # if the tool call is the final answer tool, we stop
            if tool_name == "final_answer":
                break
        # add the final output to the chat history
        final_answer = tool_out["answer"]
        self.chat_history.extend([
            HumanMessage(content=input),
            AIMessage(content=final_answer)
        ])
        # return the final answer in dict form
        return json.dumps(tool_out)

Now initialize the agent executor:

python

agent_executor = CustomAgentExecutor()

And test the invoke method:

python

agent_executor.invoke(input="What is 10 + 10")

text

0: The add tool returned 20
1: final_answer({'answer': '10 + 10 equals 20.', 'tools_used': ['functions.add']})

{'answer': '10 + 10 equals 20.', 'tools_used': ['functions.add']}

We then get our answer and the tools that were used — all through our custom agent executor.