Using the Lambda Inference API#
The Lambda Inference API enables you to use large language models (LLMs) without the need to set up a server. The Lambda Inference API can be used as a drop-in replacement for applications currently using the OpenAPI API. See, for example, our guide on integrating the Lambda Inference API into VS Code.
To use the Lambda Inference API, first generate a Cloud API key from the dashboard. You can also use a Cloud API key that you've already generated.
In the examples below, you can replace hermes3-405b
with any of the available
models. You can obtain a list of the available models using the
/models
endpoint. Replace <API-KEY>
with your actual
Cloud API key.
Creating chat completions#
The /chat/completions
endpoint takes a list of messages that make up a
conversation, then outputs a response.
First, create and activate a Python virtual environment. Then, install the OpenAI Python API library by running:
Run, for example:
from openai import OpenAI
openai_api_key = "<API-KEY>"
openai_api_base = "https://api.lambdalabs.com/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
model = "hermes3-405b"
chat_completion = client.chat.completions.create(
messages=[{
"role": "system",
"content": "You are a helpful assistant named Hermes, made by Nous Research."
}, {
"role": "user",
"content": "Who won the world series in 2020?"
}, {
"role":
"assistant",
"content":
"The Los Angeles Dodgers won the World Series in 2020."
}, {
"role": "user",
"content": "Where was it played?"
}],
model=model,
)
print(chat_completion)
You should see output similar to:
ChatCompletion(id='chatcmpl-54ecd2c87a114a67a6928614088a7a92', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The 2020 World Series was played at Globe Life Field in Arlington, Texas, which is the home of the Texas Rangers. However, it was not the home field of the teams participating. This was due to the COVID-19 pandemic and the restrictions on travel and gatherings. The Los Angeles Dodgers played the Tampa Bay Rays for the championship.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None), content_filter_results={'hate': {'filtered': False}, 'self_harm': {'filtered': False}, 'sexual': {'filtered': False}, 'violence': {'filtered': False}, 'jailbreak': {'filtered': False, 'detected': False}, 'profanity': {'filtered': False, 'detected': False}})], created=1733460270, model='llama3.1-8b-instruct', object='chat.completion', service_tier=None, system_fingerprint='', usage=CompletionUsage(completion_tokens=70, prompt_tokens=86, total_tokens=156, completion_tokens_details=None, prompt_tokens_details=None))
Run:
curl -sS https://api.lambdalabs.com/v1/chat/completions \
-H "Authorization: Bearer <API-KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "hermes3-405b",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant named Hermes, made by Nous Research."
},
{
"role": "user",
"content": "Who won the world series in 2020?"
},
{
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020."
},
{
"role": "user",
"content": "Where was it played?"
}
]
}' | jq .
You should see output similar to:
{
"id": "chatcmpl-cbb10ffe2bf24c81a37d86204a3ec835",
"object": "chat.completion",
"created": 1733448149,
"model": "hermes3-8b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The 2020 World Series was played at Globe Life Field in Arlington, Texas, due to the COVID-19 pandemic restrictions. All games were played at this neutral site to minimize travel and potential exposure to the virus."
},
"finish_reason": "stop",
"content_filter_results": {
"hate": {
"filtered": false
},
"self_harm": {
"filtered": false
},
"sexual": {
"filtered": false
},
"violence": {
"filtered": false
},
"jailbreak": {
"filtered": false,
"detected": false
},
"profanity": {
"filtered": false,
"detected": false
}
}
}
],
"usage": {
"prompt_tokens": 65,
"completion_tokens": 45,
"total_tokens": 110,
"prompt_tokens_details": null,
"completion_tokens_details": null
},
"system_fingerprint": ""
}
Creating completions#
The /completions
endpoint takes a single text string (a prompt) as input,
then outputs a response. In comparison, the /chat/completions
endpoint takes
a list of messages as input.
To use the /completions
endpoint:
First, create and activate a Python virtual environment. Then, install the OpenAI Python API library by running:
Run, for example:
from openai import OpenAI
openai_api_key = "<API-KEY>"
openai_api_base = "https://api.lambdalabs.com/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
model = "hermes3-405b"
response = client.completions.create(
prompt="Computers are",
temperature=0,
model=model,
)
print(response)
You should see output similar to:
Completion(id='chatcmpl-2b9da158a108459cb7e2e9ee61e72e49', choices=[CompletionChoice(finish_reason='stop', index=0, logprobs=Logprobs(text_offset=None, token_logprobs=None, tokens=None, top_logprobs=None), text='electronic devices that can be programmed to perform a variety of tasks, from simple calculations to complex operations. They can process and store vast amounts of data, communicate with other devices, and execute instructions at incredibly high speeds.')], created=1733460512, model='llama3.1-8b-instruct', object='text_completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=45, prompt_tokens=38, total_tokens=83, completion_tokens_details=None, prompt_tokens_details=None))
Run:
curl -sS https://api.lambdalabs.com/v1/completions \
-H "Authorization: Bearer <API-KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "hermes3-405b",
"prompt": "Computers are",
"temperature": 0
}' | jq .
You should see output similar to:
{
"id": "chatcmpl-8e46443e199a446ea8a49ed124cad61b",
"object": "text_completion",
"created": 1733448483,
"model": "hermes3-8b",
"choices": [
{
"text": "1. Electronic devices that process data and perform a wide range of tasks\n2. Calculating machines used for complex mathematical operations\n3. Devices that can store and retrieve information\n4. Tools that enhance communication through email, instant messaging, and video conferencing\n5. Platforms for creating and sharing multimedia content, such as videos, photos, and music\n6. Essential tools for businesses and organizations in managing operations, financial transactions, and customer relations\n7. Systems used in scientific research and data analysis\n8. Devices that can be programmed to perform specific tasks and solve problems\n9. Networked tools that enable collaboration and resource sharing among users\n10. Powerful machines capable of performing complex computations, simulations, and artificial intelligence tasks.",
"index": 0,
"finish_reason": "stop",
"logprobs": {
"tokens": null,
"token_logprobs": null,
"top_logprobs": null,
"text_offset": null
}
}
],
"usage": {
"prompt_tokens": 23,
"completion_tokens": 149,
"total_tokens": 172,
"prompt_tokens_details": null,
"completion_tokens_details": null
}
}
Listing models#
The /models
endpoint lists the models available for use through the Lambda
Inference API.
To use the /models
endpoint:
First, create and activate a Python virtual environment. Then, install the OpenAI Python API library by running:
Run:
from openai import OpenAI
openai_api_key = "<API-KEY>"
openai_api_base = "https://api.lambdalabs.com/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
client.models.list()
You should see output similar to:
SyncPage[Model](data=[Model(id='hermes3-405b', created=1724347380, object='model', owned_by='lambda'), Model(id='hermes3-70b', created=1724347380, object='model', owned_by='lambda'), Model(id='hermes3-8b', created=1724347380, object='model', owned_by='lambda'), Model(id='lfm-40b', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.1-405b-instruct-fp8', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.1-70b-instruct-fp8', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.1-8b-instruct', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.1-nemotron-70b-instruct-fp8', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.2-3b-instruct', created=1724347380, object='model', owned_by='lambda'), Model(id='qwen25-coder-32b-instruct', created=1724347380, object='model', owned_by='lambda')], object='list')
Run:
You should see output similar to:
{
"object": "list",
"data": [
{
"id": "hermes3-405b",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "hermes3-70b",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "hermes3-8b",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "lfm-40b",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "llama3.1-405b-instruct-fp8",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
[…]
{
"id": "qwen25-coder-32b-instruct",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
}
]
}
Note
Currently, the following models are available:
hermes3-405b
hermes3-70b
hermes3-8b
lfm-40b
llama3.1-405b-instruct-fp8
llama3.1-70b-instruct-fp8
llama3.1-8b-instruct
llama3.1-nemotron-70b-instruct-fp8
llama3.2-11b-vision-instruct
llama3.2-3b-instruct
llama3.3-70b-instruct-fp8
qwen25-coder-32b-instruct