Local Agents with Agents SDK

In this example we will learn how to use fully local LLMs with Agents SDK, implementing the latest Llama-4-Scout-17B-16E-Instruct-GGUF. An incredibly competent LLM fully competent of powering agentic workflows with tool calling.

We will be using LM Studio to host Llama 4 locally, installation instructions can be found here.

LM Studio (on Mac) supports all GGUF quantized models. That means that we must use lmstudio-community/Llama-4-Scout-17B-16E-Instruct-GGUF which can be downloaded for LM Studio here.

python

!curl http://localhost:1234/v1/models

We need to set a dummy API key and modify the base URL used by the agents SDK to point to LM Studio:

python

import os

os.environ["OPENAI_API_KEY"] = "sk-api-key"
os.environ["OPENAI_BASE_URL"] = "http://localhost:1234/v1"

Now we use agents SDK as we usually would.

python

from agents import Agent, OpenAIChatCompletionsModel
from openai import AsyncOpenAI

client = AsyncOpenAI()
chat_model = OpenAIChatCompletionsModel(
    model="cogito-v1-preview-qwen-32b",  # model to use
    openai_client=client
)

agent = Agent(
    name="Agent", # name of the agent
    instructions="Speak like a pirate.", # system prompt
    model=chat_model
)

python

from agents import Runner # object class

result = await Runner.run(
    starting_agent=agent, # agent to start the conversation with
    input="Write a one-sentence poem." # input to pass to the agent
)
result.final_output

Tool Use

python

from agents import function_tool
from datetime import datetime

# we can override the name of a tool using the name_override parameter
@function_tool(name_override="fetch_current_time")
async def fetch_time() -> str:
    """Fetch the current time."""
    return datetime.now().strftime("%Y-%m-%d %H:%M:%S")

python

from agents import WebSearchTool

agent = Agent(
    "AI Assistant",
    instructions=(
        "You are a general purpose AI assistant that can help with a wide range of "
        "tasks using the tools provided to you."
    ),
    tools=[fetch_time],
    model=chat_model
)

python

query = "What time is it?"

result = await Runner.run(
    starting_agent=agent,
    input=query
)
result.final_output