Running a PyTorch®-based benchmark on an NVIDIA GH200 instance#
This tutorial describes how to run an NGC-based benchmark on an On-Demand Cloud (ODC) instance backed with the NVIDIA GH200 Grace Hopper Superchip. The tutorial also outlines how to run the benchmark on other ODC instance types to compare performance. The benchmark uses a variety of PyTorch® examples from NVIDIA's Deep Learning Examples repository.
Prerequisites#
To run this tutorial successfully, you'll need the following:
- A GitHub account and some familiarity with a Git-based workflow.
- The following tools and libraries installed on the machine or instance
you plan to benchmark. These tools and libraries are installed by default
on your ODC instances:
- NVIDIA driver
- Docker
- Git
- nvidia-container-toolkit
- Python
Setting up your environment#
Launch your GH200 instance#
Begin by launching a GH200 instance:
- In the Lambda Cloud console, navigate to the SSH keys page, click Add SSH Key, and then add or generate a SSH key.
- Navigate to the Instances page and click Launch Instance.
-
Follow the steps in the instance launch wizard.
- Instance type: Select 1x GH200 (96 GB).
- Region: Select an available region.
- Filesystem: Don't attach a filesystem.
- SSH key: Use the key you created in step 1.
-
Click Launch instance.
- Review the EULAs. If you agree to them, click I agree to the above to start launching your new instance. Instances can take up to five minutes to fully launch.
Set the required environment variables#
Next, set the environment variables you need to run the benchmark:
- In the Lambda Cloud console, navigate to the Instances page, find the row for your instance, and then click Launch in the Cloud IDE column. JupyterHub opens in a new window.
- In JupyterHub's Launcher tab, under Other, click Terminal to open a new terminal.
-
Open your
.bashrc
file for editing: -
At the bottom of the file, set the following environment variables. Replace
<GIT-USERNAME>
with your GitHub username and<GIT-EMAIL>
with the email address associated with your GitHub account:Note
If desired, you can update the value of
NAME_NGC
below to reflect the latest version of PyTorch®. This tutorial isn't pinned to a specific version. -
Save and exit.
-
Update your environment with your new environment variables:
Running the GH200 benchmark#
Run the benchmark#
Now that you've set up your environment, you can run the benchmark on your GH200 instance:
- In your web browser, navigate to the lambdal/deeplearning-benchmark repository on GitHub and then fork the repository. By using your own fork instead of the original repository, you'll be able to push your benchmark results to a single location.
-
In your ODC instance's JupyterHub terminal, pull the NGC PyTorch® Docker image:
-
Clone the LambdaLabsML/DeepLearningExamples repository, check out its lambda/benchmark branch, and then clone your forked repository:
-
Navigate to the
pytorch
directory, and then download and prepare the dataset that the benchmark will use. This step might take up to 20 minutes to complete:cd deeplearning-benchmark/pytorch && mkdir ~/data && sudo docker run --gpus all --rm --shm-size=256g \ -v ~/DeepLearningExamples/PyTorch:/workspace/benchmark \ -v ~/data:/data \ -v $(pwd)"/scripts":/scripts \ nvcr.io/nvidia/${NAME_NGC} \ /bin/bash -c "cp -r /scripts/* /workspace; ./run_prepare.sh ${NAME_DATASET}"
-
Create a PyTorch® configuration file for the benchmark:
-
Create a new directory named
gh200_benchmark_results
to store the benchmark results in: -
Run the benchmark:
sudo docker run --rm --shm-size=1024g \ --gpus all \ -v ~/DeepLearningExamples/PyTorch:/workspace/benchmark \ -v ~/data:/data \ -v $(pwd)"/scripts":/scripts \ -v $(pwd)/${NAME_RESULTS}:/results \ nvcr.io/nvidia/${NAME_NGC} \ /bin/bash -c "cp -r /scripts/* /workspace; ./run_benchmark.sh ${NAME_TYPE}_${NUM_GPU}x${NAME_GPU}_$(hostname)_v2 ${NAME_TASKS} 3000"
Compile the results to CSV#
When the benchmark completes, it publishes the results to a subdirectory of your
results_v2
directory. You can compile a summary of these results to CSV by
running the following scripts from the pytorch
folder:
python scripts/compile_results_pytorch_v2.py --path ${NAME_RESULTS} --precision fp32 &&
python scripts/compile_results_pytorch_v2.py --path ${NAME_RESULTS} --precision fp16
The resulting CSV files appear in the pytorch
directory:
Push the results to GitHub#
Finally, push the results to your GitHub repository:
-
In your web browser, log into GitHub and create a new fine-grained personal access token with the following configuration. Make sure to copy the token and paste it somewhere safe for future use:
- Token name: GH200 benchmarking
- Respository access: Select Only select repositories, and then select your deeplearning-benchmark fork from the dropdown.
- Permissions: Under Repository permissions, set Contents to Read and write.
-
In your terminal in JupyterHub, set an environment variable for your GitHub token. Replace
<GIT-TOKEN>
with the personal access token you created in step 1: -
Configure your Git credentials:
-
Navigate to the
pytorch
directory, and then fetch the latest changes from the repo'smaster
branch and merge them into your current branch: -
Navigate to the pytorch directory and then commit your results:
-
Set
origin
to your forked repository: -
Push the results to your forked repository.
That's it! To see your benchmarks, navigate to the pytorch
folder in your
forked repo on GitHub, and then click on one of the two CSV files. GitHub
renders your CSV files in an easy-to-scan tablular format by default.
Running the benchmark on other instance types#
Without other data to compare it to, your GH200 benchmark data is of limited use. You can run the benchmark on other ODC instance types by modifying the instructions in the Setting up your environment section above. Make the following changes:
- When launching your instance, select the instance type you want to benchmark.
-
When setting your environment variables, set
NAME_GPU
to the appropriate string for your GPU andNUM_GPU
to the number of GPUs attached to the instance. The naming pattern forNAME_GPU
is as follows:For example, if you're planning to benchmark an 8xH100 80GB SXM instance, you should set NAME_GPU and NUM_GPU as follows:
Important
If the instance type doesn't explicitly state a GPU connection type, omit
_<GPU-CONNECTION-TYPE>
from the naming pattern.
After you've launched and set up your instance, you can run the instructions in the Running the GH200 benchmark section as normal. As before, you can compare results by clicking on each of the CSV file you generated.
Cleaning up#
When you're done with your instances, terminate them to avoid incurring unnecessary costs:
- In the Lambda Cloud console, navigate to the Instances page.
- Select the checkboxes of the instances you want to delete.
- Click Terminate. A dialog appears.
- Follow the instructions and then click Terminate instances to terminate your instances.
Next steps#
- To learn how to use vLLM to serve models from a GH200 instance, see Serving Llama 3.1 8B using vLLM on an NVIDIA GH200 instance.
- To learn how to use Hugging Face's Diffusers and Transformers libraries on a GH200 instance, see Running Hugging Face Transformers and Diffusers on an NVIDIA GH200 instance.
- For more tips and tutorials, see our Education section.