In this example we will learn how to use fully local LLMs with Agents SDK, implementing the latest Llama-4-Scout-17B-16E-Instruct-GGUF. An incredibly competent LLM fully competent of powering agentic workflows with tool calling.
We will be using LM Studio to host Llama 4 locally, installation instructions can be found here.
LM Studio (on Mac) supports all GGUF quantized models. That means that we must use lmstudio-community/Llama-4-Scout-17B-16E-Instruct-GGUF which can be downloaded for LM Studio here.
python
!curl http://localhost:1234/v1/models
We need to set a dummy API key and modify the base URL used by the agents SDK to point to LM Studio:
python
import os
os.environ["OPENAI_API_KEY"] = "sk-api-key"
os.environ["OPENAI_BASE_URL"] = "http://localhost:1234/v1"
Now we use agents SDK as we usually would.
python
from agents import Agent, OpenAIChatCompletionsModel
from openai import AsyncOpenAI
client = AsyncOpenAI()
chat_model = OpenAIChatCompletionsModel(
model="cogito-v1-preview-qwen-32b", # model to use
openai_client=client
)
agent = Agent(
name="Agent", # name of the agent
instructions="Speak like a pirate.", # system prompt
model=chat_model
)
python
from agents import Runner # object class
result = await Runner.run(
starting_agent=agent, # agent to start the conversation with
input="Write a one-sentence poem." # input to pass to the agent
)
result.final_output
Tool Use
python
from agents import function_tool
from datetime import datetime
# we can override the name of a tool using the name_override parameter
@function_tool(name_override="fetch_current_time")
async def fetch_time() -> str:
"""Fetch the current time."""
return datetime.now().strftime("%Y-%m-%d %H:%M:%S")
python
from agents import WebSearchTool
agent = Agent(
"AI Assistant",
instructions=(
"You are a general purpose AI assistant that can help with a wide range of "
"tasks using the tools provided to you."
),
tools=[fetch_time],
model=chat_model
)
python
query = "What time is it?"
result = await Runner.run(
starting_agent=agent,
input=query
)
result.final_output




