In this example we will learn how to use fully local LLMs with Agents SDK, implementing the latest Llama-4-Scout-17B-16E-Instruct-GGUF. An incredibly competent LLM fully competent of powering agentic workflows with tool calling.
We will be using LM Studio to host Llama 4 locally, installation instructions can be found here.
LM Studio (on Mac) supports all GGUF quantized models. That means that we must use lmstudio-community/Llama-4-Scout-17B-16E-Instruct-GGUF which can be downloaded for LM Studio here.
python
!curl http://localhost:1234/v1/models
We need to set a dummy API key and modify the base URL used by the agents SDK to point to LM Studio:
python
import os
os.environ["OPENAI_API_KEY"] = "sk-api-key"
os.environ["OPENAI_BASE_URL"] = "http://localhost:1234/v1"
Now we use agents SDK as we usually would.
python
from agents import Agent, OpenAIChatCompletionsModel
from openai import AsyncOpenAI
client = AsyncOpenAI()
chat_model = OpenAIChatCompletionsModel(
    model="cogito-v1-preview-qwen-32b",  # model to use
    openai_client=client
)
agent = Agent(
    name="Agent", # name of the agent
    instructions="Speak like a pirate.", # system prompt
    model=chat_model
)
python
from agents import Runner # object class
result = await Runner.run(
    starting_agent=agent, # agent to start the conversation with
    input="Write a one-sentence poem." # input to pass to the agent
)
result.final_output
Tool Use
python
from agents import function_tool
from datetime import datetime
# we can override the name of a tool using the name_override parameter
@function_tool(name_override="fetch_current_time")
async def fetch_time() -> str:
    """Fetch the current time."""
    return datetime.now().strftime("%Y-%m-%d %H:%M:%S")
python
from agents import WebSearchTool
agent = Agent(
    "AI Assistant",
    instructions=(
        "You are a general purpose AI assistant that can help with a wide range of "
        "tasks using the tools provided to you."
    ),
    tools=[fetch_time],
    model=chat_model
)
python
query = "What time is it?"
result = await Runner.run(
    starting_agent=agent,
    input=query
)
result.final_output




