Skip to content

Using the Lambda bug report to troubleshoot your system#

The Lambda bug report helps simplify the process of troubleshooting by collecting system information for you into one place. This article helps you utilize the lambda-bug-report.log file to troubleshoot common issues.

Warning

The lambda-bug-report.sh is intended for use on Vector, Scalar, Hyperplane, and On-Demand products only. Do not run this script on a cluster as it installs packages that may cause unintended outcomes.

To generate a report, run:

wget -nv -O - https://raw.githubusercontent.com/lambdal-support/lambda-public-tools/main/lambda-bug-report.sh | bash -

This command creates a file named lambda-bug-report.tar.gz in your current directory that contains the Lambda bug report.

Understanding the bug report log file#

The log file is organized by system components, making it easier to locate relevant information. It includes outputs from various commands, scripts, and logs that you might otherwise have to collect manually.

While the log file is extensive, certain sections are more commonly referenced for troubleshooting. Outlined below are the key directories and files you should consider examining when troubleshooting.

Baseboard Management Controller (BMC) information#

The bmc-info folder collects sensor information and error history for systems with a BMC. This information is useful if you suspect hardware malfunctions or want to check the health status of your system components.

Note

Only Hyperplane, Scalar, Vector, and Vector Pro products have a BMC. The Vector One does not include a BMC, so these files, while present in the bug report run on a Vector One, do not contain Intelligent Platform Management Interface (IPMI) data.

Files:

ipmi-elist.txt, ipmi-sdr.txt

Drives and storage#

This section provides information about disk usage, mounted filesystems, RAID configurations, and disk I/O statistics. Look here when experiencing issues related to storage, such as disk failures, insufficient disk space, or mounting errors.

Files:

df.txt, fstab.txt, iostat.txt, lsblk.txt, mdadm-conf.txt, mdadm-scan.txt, mdstat.txt, mounts.txt

GPU memory errors#

The following files are logs of GPU memory errors, including error-correcting code (ECC) errors and remapped memory regions. They are relevant if you’re encountering GPU-related issues like crashes during computation or suspect faulty GPU memory.

Files:

ecc-errors.txt, remapped-memory.txt, uncorrected-ecc_errors.txt

Grub#

The grub folder stores the GRUB bootloader configuration files and boot command-line parameters. These files can be helpful when troubleshooting boot issues or modifying boot parameters for kernel debugging.

Files:

grub.d/50-cloudimg-settings.cfg, grub.d/init-select.cfg, grub.txt, proc_cmdline.txt

Hardware list#

This section lists all recognized hardware devices on the system. It’s useful to verify if all hardware components are detected by the system or to identify missing devices.

Files:

hw-list.txt

Networking#

You can find network configuration and status here, including IP addresses, firewall settings, and active network connections. This section is helpful when facing network connectivity issues, firewall problems, or to check network configurations.

Files:

ip-addr.txt, iptables.txt, netplan.txt, resolvectl-status.txt, ss.txt, ufw-status.txt

NVIDIA bug report and SMI#

The nvidia-bug-report.log and nvidia-smi.txt files contain detailed information about NVIDIA GPU drivers and hardware status. Use this information for diagnosing GPU-related issues, driver problems, or performance bottlenecks involving NVIDIA GPUs. For tips on how to best use the NVIDIA bug report file, see Using the NVIDIA bug report to troubleshoot your system.

Files:

nvidia-bug-report.log, nvidia-smi.txt

Repositories and packages#

This section lists installed packages, their sources, and repository configurations. Use these files to identify software conflicts, check package versions, or verify repository settings.

Files:

dpkg.txt, listd-repos.txt, pip-list.txt, sources-list.txt

Sensors#

The sensors.txt file contains internal thermal sensor readings. This information is helpful when investigating overheating issues or thermal throttling.

Files:

sensors.txt

System logs#

This section aggregates various system logs that record events and errors from different system components. It provides a broad overview of system events, kernel messages, package installation history, and error tracking.

Files:

apt-history.log, dmesg, dmesg-errors.txt, dpkg.log, journalctl.txt, kern.log, syslog

Systemctl services#

The systemctl-services.txt file contains the status of systemd services. Check this section to verify essential services are either running or have failed.

Files:

systemctl-services.txt

Top#

Top captures a snapshot of system processes and resource usage at the time the bug report was generated. This section helps identify processes consuming excessive CPU or memory resources that may lead to performance degradation.

Files:

top.txt

Bug report folder hierarchy#

The files in the Lambda bug report are organized into the following folders:

├── bmc-info
    ├── ipmi-elist.txt
    └── ipmi-sdr.txt
├── drives-and-storage
    ├── df.txt
    ├── fstab.txt
    ├── iostat.txt
    ├── lsblk.txt
    ├── mdadm-conf.txt
    ├── mdadm-scan.txt
    ├── mdstat.txt
    └── mounts.txt
├── gpu-memory-errors
    ├── ecc-errors.txt
    ├── remapped-memory.txt
    └── uncorrected-ecc_errors.txt
├── grub
    ├── grub.d
    │   ├── 50-cloudimg-settings.cfg
    │   └── init-select.cfg
    ├── grub.txt
    └── proc_cmdline.txt
├── hibernation-settings.txt
├── hw-list.txt
├── ibstat.txt
├── lsmod.txt
├── networking
    ├── ip-addr.txt
    ├── iptables.txt
    ├── netplan.txt
    ├── resolvectl-status.txt
    ├── ss.txt
    └── ufw-status.txt
├── nvidia-bug-report.log
├── nvidia-smi.txt
├── repos-and-packages
    ├── dpkg.txt
    ├── listd-repos.txt
    ├── pip-list.txt
    └── sources-list.txt
├── sensors.txt
├── sysctl-all.txt
├── system-logs
    ├── apt-history.log
    ├── dmesg
    ├── dmesg-errors.txt
    ├── dpkg.log
    ├── journalctl.txt
    ├── kern.log
    └── syslog
├── systemctl-services.txt
└── top.txt

Contact Lambda#

If you can’t discover the cause for the issue you are experiencing, contact Lambda Support and provide us with your Lambda bug report.

Other resources#