Aurelio logo
Updated on July 10, 2025

Local Agents with Agents SDK

AI Engineering

In this example we will learn how to use fully local LLMs with Agents SDK, implementing the latest Llama-4-Scout-17B-16E-Instruct-GGUF. An incredibly competent LLM fully competent of powering agentic workflows with tool calling.

We will be using LM Studio to host Llama 4 locally, installation instructions can be found here.

LM Studio (on Mac) supports all GGUF quantized models. That means that we must use lmstudio-community/Llama-4-Scout-17B-16E-Instruct-GGUF which can be downloaded for LM Studio here.

python
!curl http://localhost:1234/v1/models

We need to set a dummy API key and modify the base URL used by the agents SDK to point to LM Studio:

python
import os

os.environ["OPENAI_API_KEY"] = "sk-api-key"
os.environ["OPENAI_BASE_URL"] = "http://localhost:1234/v1"

Now we use agents SDK as we usually would.

python
from agents import Agent, OpenAIChatCompletionsModel
from openai import AsyncOpenAI

client = AsyncOpenAI()
chat_model = OpenAIChatCompletionsModel(
model="cogito-v1-preview-qwen-32b", # model to use
openai_client=client
)

agent = Agent(
name="Agent", # name of the agent
instructions="Speak like a pirate.", # system prompt
model=chat_model
)
python
from agents import Runner # object class

result = await Runner.run(
starting_agent=agent, # agent to start the conversation with
input="Write a one-sentence poem." # input to pass to the agent
)
result.final_output

Tool Use

python
from agents import function_tool
from datetime import datetime

# we can override the name of a tool using the name_override parameter
@function_tool(name_override="fetch_current_time")
async def fetch_time() -> str:
"""Fetch the current time."""
return datetime.now().strftime("%Y-%m-%d %H:%M:%S")
python
from agents import WebSearchTool

agent = Agent(
"AI Assistant",
instructions=(
"You are a general purpose AI assistant that can help with a wide range of "
"tasks using the tools provided to you."
),
tools=[fetch_time],
model=chat_model
)
python
query = "What time is it?"

result = await Runner.run(
starting_agent=agent,
input=query
)
result.final_output