hi, nv

Device information: Orin Devkit, [JetPack 5.1 SDK][ Jetson Linux 35.2.1]

[   16.736563] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null)
[   16.738464] systemd[1]: Started Journal Service.
[   16.751024] nvmap_heap_init: nvmap_heap_init: created heap block cache
[   16.757874] tegra-carveouts tegra-carveouts: fsi :dma coherent mem declare 0x0000000833000000,16777216
[   16.767470] tegra-carveouts tegra-carveouts: assigned reserved memory node fsi-carveout
[   16.767478] tegra-carveouts tegra-carveouts: vpr :dma coherent mem declare 0x0000000849800000,914358272
[   16.785355] tegra-carveouts tegra-carveouts: assigned reserved memory node vpr-carveout
[   16.785361] nvmap_page_pool_init: Total RAM pages: 7830708
[   16.799227] nvmap_page_pool_init: nvmap page pool size: 978838 pages (3823 MB)
[   16.799290] nvmap_background_zero_thread: PP zeroing thread starting.
[   16.799393] misc nvmap: created heap vpr base 0x0000000849800000 size (892928KiB)
[   16.802629] misc nvmap: created heap fsi base 0x0000000833000000 size (16384KiB)
[   16.845336] systemd-journald[302]: Received client request to flush runtime journal.
[   16.865441] nvgpu: 17000000.ga10b          nvgpu_nvhost_syncpt_init:135  [INFO]  syncpt_unit_base 60000000 syncpt_unit_size 4000000 size 10000
[   16.865441] 
[   17.573822] BUG: Bad page map in process systemd-udevd  pte:cd923ad55e4b3689 pmd:109d62003
[   17.577782] Unable to handle kernel paging request at virtual address 0052c76c20ac1ef2
[   17.582397] addr:0000ffff8cb9a000 vm_flags:00000075 anon_vma:0000000000000000 mapping:ffff788105b13978 index:157
[   17.582413] file:libcrypto.so.1.1 fault:ext4_filemap_fault mmap:ext4_file_mmap readpage:ext4_readpage
[   17.590656] Mem abort info:
[   17.613474] Disabling lock debugging due to kernel taint
[   17.616762]   ESR = 0x96000004
[   17.621456] BUG: Bad page map in process systemd-udevd  pte:f4e3eabf37d65cab pmd:109d62003
[   17.622838]   EC = 0x25: DABT (current EL), IL = 32 bits
[   17.630620] addr:0000ffff8cb9b000 vm_flags:00000075 anon_vma:0000000000000000 mapping:ffff788105b13978 index:158
[   17.630641] file:libcrypto.so.1.1 fault:ext4_filemap_fault mmap:ext4_file_mmap readpage:ext4_readpage
[   17.638636]   SET = 0, FnV = 0
[   17.659726] Unable to handle kernel paging request at virtual address 00751d56fd13cc48
[   17.667917] Mem abort info:
[   17.668146]   EA = 0, S1PTW = 0
[   17.670791]   ESR = 0x96000004
[   17.670796]   EC = 0x25: DABT (current EL), IL = 32 bits
[   17.674444] Data abort info:
[   17.677188]   SET = 0, FnV = 0
[   17.677189]   EA = 0, S1PTW = 0
[   17.677190] Data abort info:
[   17.677190]   ISV = 0, ISS = 0x00000004
[   17.677191]   CM = 0, WnR = 0
[   17.677192] [00751d56fd13cc48] address between user and kernel address ranges
[   17.677197] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[   17.682767]   ISV = 0, ISS = 0x00000004
[   17.685630] Modules linked in: ina3221 pwm_fan
[   17.688770]   CM = 0, WnR = 0
[   17.688772] [0052c76c20ac1ef2] address between user and kernel address ranges
[   17.692005]  nvgpu nvmap ip_tables x_tables
[   17.738248] CPU: 11 PID: 365 Comm: systemd-udevd Tainted: G    B             5.10.104-tegra #1
[   17.747125] Hardware name: Unknown Jetson AGX Orin/Jetson AGX Orin, BIOS 2.1-32413640 01/24/2023
[   17.756156] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
[   17.762327] pc : unmap_page_range+0x584/0x8e0
[   17.766807] lr : unmap_page_range+0x35c/0x8e0
[   17.771285] sp : ffff800011ee3a30
[   17.774699] x29: ffff800011ee3a30 x28: e790fc63d921008e 
[   17.780165] x27: ffff800011ee3c38 x26: ffff7881081b3480 
[   17.785634] x25: 0000ffff8cb9e000 x24: ffff788109d5f228 
[   17.791100] x23: 01d475d3f4cf3178 x22: ffff800011ee3b18 
[   17.796562] x21: 0000ffff8cb9d000 x20: ffffffe2040758a8 
[   17.802020] x19: ffff788109d62ce8 x18: 0000000000000000 
[   17.807485] x17: 0000000000000000 x16: 0000000000000000 
[   17.812952] x15: 0000000000000000 x14: 0000000000000000 
[   17.818410] x13: 0000000000000000 x12: 0000000000000000 
[   17.823874] x11: 0000000000000000 x10: 0000000000000a80 
[   17.829331] x9 : ffff800011ee3930 x8 : ffff788109e73660 
[   17.834793] x7 : 000000000000000b x6 : 0000000000000011 
[   17.840250] x5 : 00000000410fd420 x4 : 0000000000f0000f 
[   17.845713] x3 : 0000000000000000 x2 : fffffdffffe00000 
[   17.851182] x1 : 00751d56fd13cc40 x0 : 7801d475d3f4cf31 
[   17.856646] Call trace:
[   17.859157]  unmap_page_range+0x584/0x8e0
[   17.863285]  unmap_single_vma+0x78/0xd0
[   17.867227]  unmap_vmas+0x78/0xf0
[   17.870643]  exit_mmap+0xd0/0x190
[   17.874058]  mmput+0x80/0x150
[   17.877114]  do_exit+0x2fc/0xab0
[   17.880435]  do_group_exit+0x4c/0xb0
[   17.884111]  get_signal+0x104/0x830
[   17.887702]  do_notify_resume+0x248/0xa00
[   17.891827]  work_pending+0xc/0x384
[   17.895416] Code: cb813041 b26babe2 f2dfbfe2 8b011841 (f9400422) 
[   17.901701] ---[ end trace 313bf8aeb4b0f8ba ]---
[   17.906450] Kernel panic - not syncing: Oops: Fatal exception
[   17.912350] SMP: stopping secondary CPUs
[   18.943259] SMP: failed to stop secondary CPUs 4,11
[   18.948276] Kernel Offset: 0x515170910000 from 0xffff800010000000
[   18.954550] PHYS_OFFSET: 0xffff878000000000
[   18.958843] CPU features: 0x0040006,4a80aa38
[   18.963230] Memory Limit: none
[   18.966361] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---

full logs:
log0.txt (61.9 KB)
log1.txt (77.4 KB)
log2.txt (76.6 KB)

How this issue reproduced?
Any application or test running?

No. Just power on the device.

You mean the device crach after reflashed the 35.2.1 SW at the 1st boot up?

yes

There seems something not reflash well, please use SDK manager to reflash it again.

still crash
orin_side.log (189.5 KB)

[   14.703079] systemd[1]: Started Journal Service.                             
[   14.732211] systemd-journald[301]: Received client request to flush runtime .
[   16.548080] using random self ethernet address                               
[   16.552868] using random host ethernet address                               
[   16.578798] nvidia: loading out-of-tree module taints kernel.                
[   16.723858] Internal error: SP/PC alignment exception: 8a000000 [#1] PREEMPTP
[   16.731497] Modules linked in: nvidia(O) snd_hda_core snd_soc_simple_card_uts
[   16.743776] Unable to handle kernel paging request at virtual address 46e85e4
[   16.752889] CPU: 6 PID: 44 Comm: ksoftirqd/6 Tainted: G           O      5.11
[   16.752893] Hardware name: Unknown Jetson AGX Orin/Jetson AGX Orin, BIOS 2.13
[   16.761045] Mem abort info:                                                  
[   16.769541] pstate: 60c00009 (nZCv daif +PAN +UAO -TCO BTYPE=--)             
[   16.769545] pc : 0x24884691424cfb76                                          
[   16.769554] lr : rcu_core+0x274/0x980                                        
[   16.769559] sp : ffff8000104d3c90                                            
[   16.778609]   ESR = 0x86000004                                               
[   16.781492] x29: ffff8000104d3c90 x28: ffffccccfda47000                      
[   16.781495] x27: ffff3643003a8e80 x26: ffff3643003a8e80                      
[   16.787683]   EC = 0x21: IABT (current EL), IL = 32 bits                     
[   16.791271]                                                                  
[   16.791271] x25: ffff8000104d3d20 x24: ffff364a2ec61af0                      
[   16.791274] x23: ffffccccfddcae40 x22: ffff3643003a8e80                      
[   16.795041]   SET = 0, FnV = 0 

Please try jp5.1.1. Also, your log all got truncated in each line. Resize your console before copy the log.

Do you have other jetson orin module or carrier board on your side to do cross check?

crash again
device1.log (126.8 KB)

[   13.721495] nvmap_page_pool_init: Total RAM pages: 7830392
[   13.721496] nvmap_page_pool_init: nvmap page pool size: 978799 pages (3823 MB)
[   13.721546] nvmap_background_zero_thread: PP zeroing thread starting.
[   13.721638] misc nvmap: created heap vpr base 0x0000000849800000 size (892928KiB)
[   13.724645] misc nvmap: created heap fsi base 0x0000000833000000 size (16384KiB)
[   13.791136] nvgpu: 17000000.ga10b          nvgpu_nvhost_syncpt_init:135  [INFO]  syncpt_unit_base 60000000 syncpt_unit_size 4000000 size 10000
[   13.791136] 
[   13.825626] systemd[1]: Started Journal Service.
[   13.854194] systemd-journald[307]: Received client request to flush runtime journal.
[   16.529135] using random self ethernet address
[   16.533789] using random host ethernet address
[   16.561761] nvidia: loading out-of-tree module taints kernel.
[   16.931027] OF: graph: no port node found in /i2c@c240000/ucsi_ccg@8/connector@0
[   17.025561] OF: graph: no port node found in /i2c@c240000/ucsi_ccg@8/connector@0
[   17.033413] OF: graph: no port node found in /i2c@c240000/ucsi_ccg@8/connector@0
[   17.869262] using random self ethernet address
[   17.873903] using random host ethernet address
[   18.168613] CPU8: shutdown
[   18.277921] Unable to handle kernel paging request at virtual address b0c8e9cbdb234096
[   18.278528] Unable to handle kernel paging request at virtual address 002febb43c8940a9
[   18.286099] Mem abort info:
[   18.294286] Mem abort info:
[   18.294288]   ESR = 0x96000004
[   18.294292]   EC = 0x25: DABT (current EL), IL = 32 bits
[   18.294296]   SET = 0, FnV = 0
[   18.297179]   ESR = 0x96000004
[   18.297181]   EC = 0x25: DABT (current EL), IL = 32 bits
[   18.297182]   SET = 0, FnV = 0
[   18.297183]   EA = 0, S1PTW = 0
[   18.297184] Data abort info:
[   18.297185]   ISV = 0, ISS = 0x00000004
[   18.297185]   CM = 0, WnR = 0

Do you have other jetson orin module or carrier board on your side to do cross check?

yes. i have other devkit.

devkit A = module A + carrier board A
devkit B = module B + carrier board B

do you mean connect module A by carrier board B,
and connnect module B by carrier board A, for cross check?

Yes, check if only one module has this issue or not.

devkit A = module A + carrier board A (the original device with crash issue)
devkit B = module B + carrier board B (another device which is ok)

connect module A with carrier board B, it still crash.
connect module B with carrier board A, it is ok.

so only one module(module A) has crash issue.

please RMA module A.

ok. Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.