In this example we will learn how to use fully local LLMs with Agents SDK, implementing the latest Llama-4-Scout-17B-16E-Instruct-GGUF
. An incredibly competent LLM fully competent of powering agentic workflows with tool calling.
We will be using LM Studio to host Llama 4 locally, installation instructions can be found here .
LM Studio (on Mac) supports all GGUF quantized models. That means that we must use lmstudio-community/Llama-4-Scout-17B-16E-Instruct-GGUF
which can be downloaded for LM Studio here .
python
! curl http: // localhost: 1234 / v1 / models
We need to set a dummy API key and modify the base URL used by the agents SDK to point to LM Studio:
python
import os
os.environ[ "OPENAI_API_KEY" ] = "sk-api-key"
os.environ[ "OPENAI_BASE_URL" ] = "http://localhost:1234/v1"
Now we use agents SDK as we usually would.
python
from agents import Agent, OpenAIChatCompletionsModel
from openai import AsyncOpenAI
client = AsyncOpenAI()
chat_model = OpenAIChatCompletionsModel(
model = "cogito-v1-preview-qwen-32b" , # model to use
openai_client = client
)
agent = Agent(
name = "Agent" , # name of the agent
instructions = "Speak like a pirate." , # system prompt
model = chat_model
)
python
from agents import Runner # object class
result = await Runner.run(
starting_agent = agent, # agent to start the conversation with
input = "Write a one-sentence poem." # input to pass to the agent
)
result.final_output
python
from agents import function_tool
from datetime import datetime
# we can override the name of a tool using the name_override parameter
@function_tool ( name_override = "fetch_current_time" )
async def fetch_time () -> str :
"""Fetch the current time."""
return datetime.now().strftime( "%Y-%m- %d %H:%M:%S" )
python
from agents import WebSearchTool
agent = Agent(
"AI Assistant" ,
instructions = (
"You are a general purpose AI assistant that can help with a wide range of "
"tasks using the tools provided to you."
),
tools = [fetch_time],
model = chat_model
)
python
query = "What time is it?"
result = await Runner.run(
starting_agent = agent,
input = query
)
result.final_output