AI Workbench

NVIDIA AI Workbench (NVWB) is a software toolkit designed to help AI engineers and data scientists build in GPU-enabled environments.

Using NVWB, we can set up a local AI project with a prebuilt template with a few clicks. Then, after building out our project locally, we can quickly deploy it to a more powerful remote GPU instance, switch to a different remote, or go back to local.

By abstracting away many repetitive and tedious boilerplate actions, NVWB aims to help AI engineers focus on the core of AI development. It helps us reduce time spent on managing our dev environment, deployments, or maintaining remote compute instances.

In this tutorial, we'll learn about NVWB's features, where to use it, and how to use it.

Getting Started

Installation is straightforward but does require a few things outside of AI Workbench — several of these are handled already during the AI Workbench install. Those are:

NVIDIA AI Workbench (of course)
WSL2, handled by NVWB
Docker (on Ubuntu) or Docker Desktop (Windows and MacOS), we need to install
NVIDIA GPU drivers (if on Windows, on Ubuntu NVWB handles this)

So, to get started, we'll head to NVIDIA's AI Workbench page and click Download. During installation, we follow the steps displayed to us and install any extra required packages (like Docker Desktop).

Once that is complete, all that is left is to install the NVIDIA GPU drivers.

NVIDIA GPU Drivers

When running AI and ML processes on GPU our code and GPU interact through several layers — they cannot directly interact. Our code will call CUDA, which calls our GPU drivers, which then control the GPU.

CUDA is the software layer between the heavy-duty computations in our ML / AI code and an NVIDIA GPU. CUDA will massively speed up ML and AI processes, including generating text with LLMs, generating images with Diffusion models, processing data with cuDF, and much more.

CUDA installation is handled by NVWB, and on Ubuntu, the NVIDIA GPU driver installs are also handled. However, on Windows, we must install these ourselves. There are two recommended approaches for this that depend on your GPU. If you are using RTX (such as the NVIDIA RTX 5000 Ada GPU as used in Dell's Precision AI-ready workstations), you can install RTX Experience, and for GeForce, you use GeForce Experience.

The Workbench Location

Our local machine acts as our local workbench location. Ideally, we want to run on a machine with a CUDA-enabled GPU — but thanks to NVWB's easy integration with remote instances, this isn't required.

Using remote instances, we could run AI Workbench on a Mac (which doesn't support CUDA) and add a remote connection to a GPU instance if/when we need it.

Most AI tasks require a ton of compute — more compute than most of us have locally. These scenarios are a primary use case for NVWB. NVWB allows us to switch back and forth between a local dev instance and a high-compute remote instance for high-compute tasks.

For example, we may use a MacBook Pro M2 as our local workbench. Then, we head to AWS, spin up an EC2 instance with an A100 GPU, and connect via workbench. Now, we have our local MacBook workbench and our remote A100 workbench.

With this setup, we might write most of our code on our local workbench, build our data processing pipeline, run small tests, and try our pipeline with a small AI model. Then, once we're ready to try it out on some more capable hardware, we activate our A100 EC2 instance, and within a few clicks in NVWB, we can run the same process on our remote instance.

If you'd like more info on setting this up, we have a complete walkthrough here. However, for the remainder of this article, we will use the local option for NVWB.

Starting a Project with NVWB

We should see our initial locations when starting AI Workbench, including only our local instance.

NVIDIA AI Workbench Locations

We click through to open My Projects. As we do not have any existing projects, we will be given two options: Start a New Project or Clone a Project. For now, let's clone a project.

NVIDIA AI Workbench Projects

There are already many example projects that we can find in NVIDIA's GitHub organization. We navigate to the organization search bar and type "workbench" to filter for projects.

We will be using the nVIDIA/workbench-examples-rapids-cudf project. Copy and paste the URL into the Clone a Project modal. NVWB will generate a save path for us; you can change this if needed. After we click Clone the project will be added to our NVWB projects.

Project Page

The project page is where engineers spend most of their time outside of writing code. It allows them to view and manage their project files, environment, version control, deployments, and more.

NVIDIA AI Workbench Project Page

Most of these components are self-explanatory, but it is worth diving into how the project environment is set up and managed via the Environment tab ⁽¹⁾ and build status ⁽²⁾.

NVWB Container

Projects in NVWB are run using a Docker or Podman container. Our container has a base image (like a template or setup instruction file) that includes everything we need to run the project. If we scroll down and click the Build button, we can find the image name in a few lines.

NVIDIA AI Workbench Container

Docker build details showing the image URL docker.io/rapidsai/notebooks.

If we navigate to our Docker Desktop app, we can find this exact image by searching for rapidsai/notebooks.

Image in Docker Desktop

The provided README provides information about this image, which we can also find on Docker Hub. The base version of this image is rapidsai/base and contains everything we need to run RAPIDS. However, we're using rapids/notebooks, which takes the base image and adds a JupyterLab server.

From our Environment tab, we can see the details of this container, which includes some very high-level information about what we just saw in Docker Hub.

NVIDIA AI Workbench Environments tab with Docker container information

Next, we have Packages. These are a set of packages installed during container build. NVWB installs packages with conda and apt via the Docker image and by pip via the local requirements.txt file.

Following this, we have Variables and Secrets. Variables are non-sensitive environment variables stored in the variables.env file and synced to your project repo. Storing variables here makes collaboration easier as they will be shared with whoever can access the project repo.

Secrets are sensitive (of course), so NVWB stores them within your local instance of NVWB and nowhere else—that means when cloning the project elsewhere, you must re-enter any secrets.

Other noteworthy items here are:

Mounts allow us to share data between our local machine and the isolated project container. We use this to avoid losing data when stopping a container.
Applications allow us to define integrations with different apps. The JupyterLab integration lives here by default, but we can add other apps like VS Code.
Hardware allows us to define hardware access for our container — you may want to adjust the Shared Memory value here depending on your project. If you're processing a lot of data sitting in RAM, we need to ensure we have allocated enough memory.

Building with NVWB

We should now have a reasonable understanding of how NVWB is structuring and running our project. Let's now take a look at how we actually use it during development.

Depending on the project, many of us will likely use NVWB's JupyterLab server or VS Code plugin. The RAPIDS cuDF project is focused on data exploration — an ideal use case for JupyterLab. We click Open JupyterLab to get started.

Projects Structure

By default, NVWB structures projects as follows:

text

/code                   # where we put code, tracked by git
/data                   # data storage, tracked by git-lfs
/models                 # model storage, tracked with git-lfs
apt.txt                 # packages to be installed with apt
base-command.sh         # -
LICENSE.txt             # license doc
postBuild.sh            # shell script run after docker build
preBuild.sh             # shell script run before docker build
README.md               # readme doc
requirements.txt        # python packages to be installed with pip
variables.env           # where env vars are stored

The structure of these projects allows us to easily switch or create new projects and know what needs to go where. At the same time, this isn't a strict structure, so we can modify things either inside or outside of NVWB if/when needed.

JupyterLab Server

Within JupyterLab, we will find the CuDF example project. First, let's confirm we have an NVIDIA GPU connected by opening a terminal window and entering nvidia-smi. We should see something like:

Running nvidia-smi

We can confirm that CUDA v12.4 is installed, and the machine is running an NVIDIA RTX 5000 GPU. Naturally, your output will likely be different.

Let's head over to /code and open one of the example notebooks for CuDF — we will run through the cudf-pandas-demo.ipynb notebook.

The notebook runs us through cuDF, a GPU-enabled data processing library from NVIDIA. It allows us to use our GPU to accelerate the standard Pandas library. We'll see how soon.

First, we can download a dataset and load it from a file using standard Pandas. We will time the load time using the %%time magic command:

text

!wget https://data.rapids.ai/datasets/nyc_parking/nyc_parking_violations_2022.parquet -O /tmp/nyc_parking_violations_2022.parquet

python

import pandas as pd

%%time

# read 5 columns data:
df = pd.read_parquet(
    "/tmp/nyc_parking_violations_2022.parquet",
    columns=["Registration State", "Violation Description", "Vehicle Body Type", "Issue Date", "Summons Number"]
)

(df[["Registration State", "Violation Description"]]  # get only these two columns
 .value_counts()  # get the count of offences per state and per type of offence
 .groupby("Registration State")  # group by state
 .head(1)  # get the first row in each group (the type of offence with the largest count)
 .sort_index()  # sort by state name
 .reset_index()
)

Let's run a few more operations using pure Pandas without GPU acceleration. We'll time each and compare these values to cuDF-accelerated Pandas.

python

%%time

(df
 .groupby(["Vehicle Body Type"])
 .agg({"Summons Number": "count"})
 .rename(columns={"Summons Number": "Count"})
 .sort_values(["Count"], ascending=False)
)

python

%%time

weekday_names = {
    0: "Monday",
    1: "Tuesday",
    2: "Wednesday",
    3: "Thursday",
    4: "Friday",
    5: "Saturday",
    6: "Sunday",
}

df["Issue Date"] = df["Issue Date"].astype("datetime64[ms]")
df["issue_weekday"] = df["Issue Date"].dt.weekday.map(weekday_names)

df.groupby(["issue_weekday"])["Summons Number"].count().sort_values()

To switch across to cudf. pandas, we need first to shut down the kernel and load the cudf. pandas extension, we can do this via code commands:

python

get_ipython().kernel.do_shutdown(restart=True)

text

%load_ext cudf.pandas

Now we run Pandas as usual, in the backend it will be using cuDF-acceleration. Let's try:

python

import pandas as pd

%%time

# read 5 columns data:
df = pd.read_parquet(
    "/tmp/nyc_parking_violations_2022.parquet",
    columns=["Registration State", "Violation Description", "Vehicle Body Type", "Issue Date", "Summons Number"]
)

python

(df[["Registration State", "Violation Description"]]  # get only these two columns
 .value_counts()  # get the count of offences per state and per type of offence
 .groupby("Registration State")  # group by state
 .head(1)  # get the first row in each group (the type of offence with the largest count)
 .sort_index()  # sort by state name
 .reset_index()
)

We're already seeing faster results. Let's repeat the other operations that we performed earlier with standard Pandas.

python

%%time

(df
 .groupby(["Vehicle Body Type"])
 .agg({"Summons Number": "count"})
 .rename(columns={"Summons Number": "Count"})
 .sort_values(["Count"], ascending=False)
)

python

%%time

weekday_names = {
    0: "Monday",
    1: "Tuesday",
    2: "Wednesday",
    3: "Thursday",
    4: "Friday",
    5: "Saturday",
    6: "Sunday",
}

df["Issue Date"] = df["Issue Date"].astype("datetime64[ms]")
df["issue_weekday"] = df["Issue Date"].dt.weekday.map(weekday_names)

df.groupby(["issue_weekday"])["Summons Number"].count().sort_values()

We can see cuDF doing its magic:

Operation	Pandas time	cuDF.Pandas time	Speedup
Read and aggregation	7.31s	488ms	15x
Aggregation and rename	1.13s	28.4ms	39.8x
Datetime, map, aggregation	2.77s	581ms	4.8x

The dataset we're using here is relatively small, but as soon as we start working with big datasets, these (already impressive) multipliers become increasingly valuable.

Closing our Project

Once we're done with our task, we can exit the JupyterLab server and close it down by Navigating back to NVWB, clicking the apps running status at the bottom-right of the window, and toggling JupyterLab to off.

In a typical project we are developing, we would have git commit permissions. In this scenario, we would commit our changes by simply clicking Commit near the top of the screen. This would push all of our changes to GitHub or GitLab (whichever you have set up), and we'd be done!

Getting Started with NVIDIA's AI Workbench

AI Workbench

Getting Started

NVIDIA GPU Drivers

The Workbench Location

Starting a Project with NVWB

Project Page

NVWB Container

Building with NVWB

Projects Structure

JupyterLab Server

Closing our Project