[Linux Kernel] How to Detect User Space Crashes in the Kernel Log

[Linux Kernel] How to Detect User Space Crashes in the Kernel Log

[Linux Kernel] How to Detect User Space Crashes in the Kernel Log

Environment:

  • Kernel Version: 4.14 (Raspbian Kernel)
  • Architecture: Arm32


Background

During system bring-up, unexpected issues can arise, including crashes in user space. One common challenge is that these crashes do not appear in the kernel log by default. However, the Linux kernel includes a feature that allows tracing user space crash signatures in the kernel log.

This feature is enabled using the CONFIG_DEBUG_USER option.

How to Enable CONFIG_DEBUG_USER

Step 1: Turn On CONFIG_DEBUG_USER

To enable this feature, add the CONFIG_DEBUG_USER=y configuration to the kernel. Below is an example patch for the Raspbian kernel configuration file:

diff --git a/arch/arm/configs/bcm2711_defconfig b/arch/arm/configs/bcm2711_defconfig
index be7a837cc2d6..dc7052f49285 100644
--- a/arch/arm/configs/bcm2711_defconfig
+++ b/arch/arm/configs/bcm2711_defconfig
@@ -6,6 +6,7 @@ CONFIG_GENERIC_IRQ_DEBUGFS=y
 CONFIG_NO_HZ=y
 CONFIG_HIGH_RES_TIMERS=y
 CONFIG_BPF_SYSCALL=y
+CONFIG_DEBUG_USER=y
 CONFIG_PREEMPT_VOLUNTARY=y
 CONFIG_BSD_PROCESS_ACCT=y
 CONFIG_BSD_PROCESS_ACCT_V3=y        

With this change, the kernel will compile code blocks guarded by CONFIG_DEBUG_USER, such as the one in arch/arm/mm/fault.c, that is seen as follows:

static void
__do_user_fault(unsigned long addr, unsigned int fsr, unsigned int sig,
                int code, struct pt_regs *regs)
{
        struct task_struct *tsk = current;

        if (addr > TASK_SIZE)
                harden_branch_predictor();

#ifdef CONFIG_DEBUG_USER
        if (((user_debug & UDBG_SEGV) && (sig == SIGSEGV)) ||
            ((user_debug & UDBG_BUS)  && (sig == SIGBUS))) {
                pr_err("8<--- cut here ---\n");
                pr_err("%s: unhandled page fault (%d) at 0x%08lx, code 0x%03x\n",
                       tsk->comm, sig, addr, fsr);
                show_pte(KERN_ERR, tsk->mm, addr);
                show_regs(regs);
        }
#endif
        

Step 2: Update Kernel Command Line

Add the user_debug=31 parameter to the kernel command line to enable detailed logging of user space crashes. Below is an example patch for the device tree source file:

diff --git a/arch/arm/boot/dts/bcm2711-rpi-4-b.dts b/arch/arm/boot/dts/bcm2711-rpi-4-b.dts
index 8c0ab39beea1..33d36b4f89fa 100644
--- a/arch/arm/boot/dts/bcm2711-rpi-4-b.dts
+++ b/arch/arm/boot/dts/bcm2711-rpi-4-b.dts
@@ -282,7 +282,7 @@

 / {
        chosen {
-               bootargs = "coherent_pool=1M 8250.nr_uarts=1 snd_bcm2835.enable_compat_alsa=0 snd_bcm2835.enable_hdmi=1";
+               bootargs = "coherent_pool=1M 8250.nr_uarts=1 snd_bcm2835.enable_compat_alsa=0 snd_bcm2835.enable_hdmi=1 user_debug=31";
        };

        aliases {
        

After applying these patches, rebuild the kernel and flash the new image onto your device, such as a Raspberry Pi.

How to View User Space Crash Logs

Once the feature is enabled and the patches are applied, there is one thing to configure -

When the target device boots up, you need to enable the print-fatal-signals feature. You can do this by running the following command:

echo 1 > /proc/sys/kernel/print-fatal-signals        

Once you use above command, the configuration is ready. If the user space crash occurs, the kernel log will show user space crash signatures, such as segmentation faults. Below is an example of what you might see in the kernel log:

<7>[   82.248793 / 05-21 22:27:31.925][2] rpi.ui: unhandled page fault (11) at 0x00000000, code 0x005
<1>[   82.248810 / 05-21 22:27:31.925][2] pgd = da0fc000
<1>[   82.252580 / 05-21 22:27:31.925][2] [00000000] *pgd=00000000
<6>[   82.258764 / 05-21 22:27:31.935][2] CPU: 2 PID: 999 Comm: rpi.ui Tainted: G        W      4.14.66 
<6>[   82.258781 / 05-21 22:27:31.935][2] task: d7e0d840 ti: d7f02000 task.ti: d7f02000
<6>[   82.258793 / 05-21 22:27:31.935][2] PC is at 0x73a7c9c0
<6>[   82.258803 / 05-21 22:27:31.935][2] LR is at 0x0
<6>[   82.258814 / 05-21 22:27:31.935][2] pc : [<73a7c9c0>]    lr : [<00000000>]    psr: 00070030
<6>[   82.258814 / 05-21 22:27:31.935][2] sp : 885bc480  ip : 00000000  fp : 131e3218
<6>[   82.258830 / 05-21 22:27:31.935][2] r10: 71488688  r9 : 92778200  r8 : 00000000
<6>[   82.258840 / 05-21 22:27:31.935][2] r7 : 131e2bf0  r6 : 00000000  r5 : 00000001  r4 : 00263c64
<6>[   82.258851 / 05-21 22:27:31.935][2] r3 : 131e3218  r2 : 71488688  r1 : 00000000  r0 : 71488688
<6>[   82.258863 / 05-21 22:27:31.935][2] Flags: nzcv  IRQs on  FIQs on  Mode USER_32  ISA Thumb  Segment user
<6>[   82.258873 / 05-21 22:27:31.935][2] Control: 10c0383d  Table: 9a0fc06a  DAC: 00000051
<6>[   82.258885 / 05-21 22:27:31.935][2] CPU: 2 PID: 999 Comm: pi.ui Tainted: G        W      4.14.66-gbbd0bc4-dirty #3
<6>[   82.258907 / 05-21 22:27:31.935][2] [<c010e400>] (unwind_backtrace) from [<c010b4a8>] (show_stack+0x10/0x14)
<6>[   82.258924 / 05-21 22:27:31.935][2] [<c010b4a8>] (show_stack) from [<c0fd0740>] (dump_stack+0x78/0x98)
<6>[   82.258940 / 05-21 22:27:31.935][2] [<c0fd0740>] (dump_stack) from [<c0115f8c>] (__do_user_fault+0x108/0x19c)
<6>[   82.258957 / 05-21 22:27:31.935][2] [<c0115f8c>] (__do_user_fault) from [<c0fe05b8>] (do_page_fault+0x33c/0x3e8)
<6>[   82.258972 / 05-21 22:27:31.935][2] [<c0fe05b8>] (do_page_fault) from [<c0100404>] (do_DataAbort+0x34/0x184)
<6>[   82.258986 / 05-21 22:27:31.935][2] [<c0100404>] (do_DataAbort) from [<c0fdec3c>] (__dabt_usr+0x3c/0x40)
<6>[   82.258997 / 05-21 22:27:31.935][2] Exception stack(0xd7f03fb0 to 0xd7f03ff8)
<6>[   82.259009 / 05-21 22:27:31.935][2] 3fa0:                                     71488688 00000000 71488688 131e3218
<6>[   82.259023 / 05-21 22:27:31.935][2] 3fc0: 00263c64 00000001 00000000 131e2bf0 00000000 92778200 71488688 131e3218
<6>[   82.259036 / 05-21 22:27:31.935][2] 3fe0: 00000000 885bc480 00000000 73a7c9c0 00070030 ffffffff        

Conclusion

Enabling CONFIG_DEBUG_USER and adding user_debug=31 to the kernel command line are simple but powerful steps for tracing user space crashes. Personally, I frequently use this feature during system bring-up to identify and resolve user space crashes efficiently. It’s a valuable tool for debugging and improving system reliability.


To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics