SkyPilot makes it easy to deploy a Kubernetes cluster using Lambda Public Cloud on-demand instances. The NVIDIA GPU Operator is preinstalled so you can immediately use your instances' GPUs.
At the top of the file, forSKY_K3S_TOKEN, replace mytoken with a strong passphrase.
It's important that you use a strong passphrase. Otherwise, the Kubernetes cluster can be compromised, especially if your firewall rules allow incoming traffic from all sources.
You can generate a strong passphrase by running:
openssl rand -base64 16
This command will generate a random string of characters such as zPUlZGe4HRcy+Om04RvGmQ==.
The top of the deploy_k8s.yaml file should look similar to:
resources:cloud:lambdaaccelerators:A10:1# Uncomment the following line to expose ports on a different cloud# ports: 6443num_nodes:2envs:SKY_K3S_TOKEN:zPUlZGe4HRcy+Om04RvGmQ==# Can be any string, used to join worker nodes to the cluster
You can set accelerators to a different instance type, for example, A100:8 for an 8x A100 instance or H100:8 for an 8x H100 instance.
Create a directory in your home directory named .lambda_cloud and change into that directory by running:
Create a directory in your home directory named .lambda_cloud and change into that directory by running:
mkdir-m700~/.lambda_cloud&&cd~/.lambda_cloud
Create a file named lambda_keys that contains:
api_key = API-KEY
You can do this by running:
echo"api_key = API-KEY">lambda_keys
Replace API-KEY with your actual Cloud API key.
Use SkyPilot to launch instances and deploy Kubernetes
Change into the directory you created for this tutorial by running:
cd~/skypilot-tutorial
Then, launch 2 1x A10 instances and deploy a 2-node Kubernetes cluster using those instances by running:
bashlaunch_k8s.sh
You'll begin to see output similar to:
===== SkyPilot Kubernetes cluster deployment script =====
This script will deploy a Kubernetes cluster on the cloud and GPUs specified in cloud_k8s.yaml.
+ CLUSTER_NAME=k8s
+ sky launch -y -c k8s cloud_k8s.yaml
SkyPilot collects usage data to improve its services. `setup` and `run` commands are not collected to ensure privacy.
Usage logging can be disabled by setting the environment variable SKYPILOT_DISABLE_USAGE_COLLECTION=1.
Task from YAML spec: cloud_k8s.yaml
I 09-11 16:10:04 optimizer.py:719] == Optimizer ==
I 09-11 16:10:04 optimizer.py:730] Target: minimizing cost
I 09-11 16:10:04 optimizer.py:742] Estimated cost: $1.5 / hour
I 09-11 16:10:04 optimizer.py:742]
I 09-11 16:10:04 optimizer.py:867] Considered resources (2 nodes):
I 09-11 16:10:04 optimizer.py:937] ------------------------------------------------------------------------------------------
I 09-11 16:10:04 optimizer.py:937] CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
I 09-11 16:10:04 optimizer.py:937] ------------------------------------------------------------------------------------------
I 09-11 16:10:04 optimizer.py:937] Lambda gpu_1x_a10 30 200 A10:1 us-east-1 1.50 ✔
I 09-11 16:10:04 optimizer.py:937] ------------------------------------------------------------------------------------------
I 09-11 16:10:04 optimizer.py:937]
Running task on cluster k8s...
I 09-11 16:10:04 cloud_vm_ray_backend.py:4397] Creating a new cluster: 'k8s' [2x Lambda(gpu_1x_a10, {'A10': 1})].
I 09-11 16:10:04 cloud_vm_ray_backend.py:4397] Tip: to reuse an existing cluster, specify --cluster (-c). Run `sky status` to see existing clusters.
I 09-11 16:10:05 cloud_vm_ray_backend.py:1314] To view detailed progress: tail -n100 -f /home/lambda/sky_logs/sky-2024-09-11-16-10-03-504822/provision.log
I 09-11 16:10:06 cloud_vm_ray_backend.py:1721] Launching on Lambda us-east-1
I 09-11 16:13:24 log_utils.py:45] Head node is up.
I 09-11 16:14:10 cloud_vm_ray_backend.py:1826] Successfully provisioned or found existing head instance. Waiting for workers.
I 09-11 16:18:13 cloud_vm_ray_backend.py:1569] Successfully provisioned or found existing VMs.
I 09-11 16:18:17 cloud_vm_ray_backend.py:3319] Job submitted with Job ID: 1
It usually takes about 15 minutes for the Kubernetes cluster to be deployed.
The Kubernetes cluster is successfully deployed once you see:
Checking credentials to enable clouds for SkyPilot.
Kubernetes: enabled
Hint: Could not detect GPU labels in Kubernetes cluster. If this cluster has GPUs, please ensure GPU nodes have node labels of either of these formats: skypilot.co/accelerator, cloud.google.com/gke-accelerator, karpenter.k8s.aws/instance-gpu-name, nvidia.com/gpu.product, gpu.nvidia.com/class. Please refer to the documentation on how to set up node labels.
To enable a cloud, follow the hints above and rerun: sky check
If any problems remain, refer to detailed docs at: https://skypilot.readthedocs.io/en/latest/getting-started/installation.html
🎉 Enabled clouds 🎉
✔ Kubernetes
✔ Lambda
+ set +x
===== Kubernetes cluster deployment complete =====
You can now access your k8s cluster with kubectl and skypilot.
• View the list of available GPUs on Kubernetes: sky show-gpus --cloud kubernetes
• To launch a SkyPilot job running nvidia-smi on this cluster: sky launch --cloud kubernetes --gpus <GPU> -- nvidia-smi
To test the Kubernetes cluster, launch a jobby running: