Debugging kernel crashes using kdump

Introduction

kdump is a service that creates crash dumps when there is a kernel crash. It uses kexec(8) to boot into a secondary kernel (known as a capture kernel), then exports the contents of the kernel’s memory (known as a crash dump or vmcore) to the filesystem. The contents of vmcore can then be analyzed to root cause the kernel crash.

Configuring kdump requires setting the crashkernel kernel argument and enabling the kdump systemd service. Memory must be reserved for the crash kernel during booting of the first kernel. crashkernel=auto generally doesn’t reserve enough memory on Fedora CoreOS, so it is recommended to specify crashkernel=300M.

By default, the vmcore will be saved in /var/crash. It is also possible to write the dump to some other location on the local system or to send it over the network by editing /etc/kdump.conf. For additional information, see kdump.conf(5) and the comments in /etc/kdump.conf and /etc/sysconfig/kdump.

Configuring kdump via Ignition

Example kdump configuration
variant: fcos
version: 1.5.0
kernel_arguments:
  should_exist:
  - 'crashkernel=300M'
systemd:
  units:
  - name: kdump.service
    enabled: true

Configuring kdump after initial provision

  1. Set the crashkernel kernel argument

    sudo rpm-ostree kargs --append='crashkernel=300M'

    More information on how to modify kargs via rpm-ostree.

  2. Enable the kdump systemd service.

    sudo systemctl enable kdump.service
  3. Reboot your system.

    sudo systemctl reboot
It is highly recommended to test the configuration after setting up the kdump service, with extra attention to the amount of memory reserved for the crash kernel. For information on how to test that kdump is properly armed and how to analyze the dump, refer to the kdump documentation for Fedora and the Linux kernel documentation on kdump.
  翻译: