Support#
At Lambda, we recognize that exceptional support is critical to maximizing the value of your 1-Click Cluster (1CC) deployment. Our world-class support team is dedicated to ensuring your success at every stage, from deployment to daily operations.
Support team#
When you choose Lambda, you gain access to a dedicated support team of seasoned professionals with deep expertise in AI/ML infrastructure. This team includes:
- Customer Success Manager (CSM): Your main point of contact post-sales, responsible for ensuring the delivery of your solution and your overall satisfaction.
- Technical Account Manager(TAM): An expert specialist who understands the specifics of your deployment, provides ongoing technical guidance, and escalates any complex issues.
- Machine Learning Expert (MLE): An AI/ML expert within Lambda will provide guidance on how to integrate and scale AI workloads.
- Support Engineering: Lambda’s Support team, available 24/7 through our ticketing portal, is well-versed in 1CC and will be able to respond to any technical support request, escalating issues and incidents early and often.
Support scope#
Lambda classifies incoming support tickets into three categories:
- In scope
- Best effort
- Out of scope
In scope#
- Hardware and Infrastructure: Full support for CPU/GPU VMs, physical hosts, and networking components.
- Software Environment: Assistance with Lambda Stack, OFED drivers, Jupyter Notebooks, and essential ML tools like NCCL and Open MPI.
- Networking and Storage: Management and troubleshooting for Ethernet and InfiniBand networks, as well as persistent and local storage.
- Slurm Installation: Guidance on Slurm setup and basic troubleshooting to streamline your job scheduling processes.
- Managed Kubernetes*: If purchased as an add-on, our team of Kubernetes experts will be there to help you.
Best effort#
Our Support team is dedicated to delivering world-class customer service and technical expertise. We empower our engineers to go the extra mile, even when it means stepping beyond the standard scope of our support and engineered products to provide innovative solutions.
In these exceptional cases, we ensure our customers understand that while we strive to help, there may be no guaranteed outcome. Any solutions we provide under these circumstances won't be fully supported in the future, and Lambda cannot assume responsibility for any potential impacts.
Out of scope#
Some support requests cannot be supported, such as:
- Troubleshooting customer code
- 3rd party applications/software installed after cluster handoff
- Network/VPN connections to your cluster
SLA#
Our focus on prompt and reliable service is supported by the clearly established response times in our agreements:
Incident Level | Definition | Initial Response Time |
---|---|---|
Severity 1 | A critical Services problem in which the Services (i) are down, inoperable, inaccessible, or unavailable, (ii) otherwise materially cease operation, or (iii) perform or fail to perform so as to prevent useful work from being done. | 4 hours |
Severity 2 | A Services problem in which the Services (i) are severely limited or major functions are performing improperly, and the situation is significantly impacting certain portions of the Services users’ operations or productivity, or (ii) have been interrupted but recovered, and there is high risk of recurrence. | 8 hours |
Severity 3 | A minor or cosmetic Services problem that (i) is an irritant, affects non-essential functions, or has minimal business operations impact, (ii) is localized or has isolated impact, (iii) is an operational nuisance, (iv) results in documentation errors, or (v) is otherwise not Severity 1 or Severity 2, but represents a failure of services to conform to specifications provided | 10 hours |
Service Request | Requests for action or tasks that are not generated by an incident. | 24 hours |