1. 现象

02-10 01:10:24  [   81.726789] Kernel bug detected[#1]:
02-10 01:10:24  [   81.743065] CPU: 1 PID: 699 Comm: watchdog_monito Tainted: G           O 3.10.20-rt14-Cavium-Octeon #6
02-10 01:10:24  [   81.765061] task: 800000008c476af0 ti: 8000000088b28000 task.ti: 8000000088b28000
02-10 01:10:24  [   81.785229] $ 0   : 0000000000000000 ffffffff8054af84 0000000000000004 0000000000000008
02-10 01:10:24  [   81.869389] $ 4   : 0000000000000000 800000008c01ce88 00000000ffffff01 00000000ffffff02
02-10 01:10:24  [   81.953547] $ 8   : 800000008c01cea8 ffffffff80710000 0000000000000001 0000000000000000
02-10 01:10:24  [   82.037705] $12   : 8000000088b2bc98 800000008c211f18 0000000000000000 0000000000000000
02-10 01:10:24  [   82.121863] $16   : 800000008c20b340 800000008c07f000 0000000000000009 800000008c01dd80
02-10 01:10:24  [   82.206021] $20   : 00000000000080d0 0000000000000000 0000000000000009 ffffffff812c0000
02-10 01:10:24  [   82.290179] $24   : 00000000100c60c8 00000000774e4060                                 
02-10 01:10:25  [   82.374337] $28   : 8000000088b28000 8000000088b2bcb0 800000008c01ce80 ffffffff802698ac
02-10 01:10:25  [   82.458496] Hi    : 0000000000000001
02-10 01:10:25  [   82.474754] Lo    : 0000000000000000
02-10 01:10:25  [   82.491022] epc   : ffffffff8026997c cache_alloc_refill+0x174/0x7e0
02-10 01:10:25  [   82.509975]     Tainted: G           O
02-10 01:10:25  [   82.526410] ra    : ffffffff802698ac cache_alloc_refill+0xa4/0x7e0
02-10 01:10:25  [   82.545276] Status: 14009ce2 KX SX UX KERNEL EXL
02-10 01:10:25  [   82.638828] Cause : 00800034
02-10 01:10:25  [   82.654385] PrId  : 000d9602 (Cavium Octeon III)
02-10 01:10:25  [   82.671685]  Modules linked in: hxdrv(O) uio_generic_driver(O) led_cpld(O)  reborn_macfilter(O) logbuffer(O) fglt_b_reboot_helper(O) ramoops  reed_solomon fglt_b_cpld(O) reborn_class(O) generic_access(O)  spi_oak_island(O)
02-10 01:10:25  [   82.868798]  Process watchdog_monito (pid: 699, threadinfo=8000000088b28000,  task=800000008c476af0, tls=00000000776404a0)
02-10 01:10:25  [   82.892353] Stack : 0000000000000001 ffffffff8054abfc 800000008c073100 0000000000000000
02-10 01:10:25            800000008c211c90 ffffffff801a4e1c 800000008c01dd80 00000000000080d0
02-10 01:10:25            00000000000080d0 0000000000000001 ffffffff8016d260 00000000000080d0
02-10 01:10:25            0000000000000000 800000008c2874a0 0000000000000000 ffffffff8026974c
02-10 01:10:25            800000008c3fba00 0000000001200012 0000000000000000 800000008c3fba00
02-10 01:10:25            0000000000000000 ffffffff81230000 0000000077639068 ffffffff8016d260
02-10 01:10:25            fffffff48c073100 ffffffff80295cd8 0000000000000000 0000000000000000
02-10 01:10:26            800000008c287660 8000000088b18000 00000000000002bb 000000007f6b24e0
02-10 01:10:26            0000000001200012 0000000000000002 0000000000000000 0000000000000000
02-10 01:10:26            00000000100cb3e8 00000000100c0000 000000007f6b2500 ffffffff8016dae0
02-10 01:10:26            ...
02-10 01:10:26  [   83.630192] Call Trace:
02-10 01:10:26  [   83.645325] [<ffffffff8026997c>] cache_alloc_refill+0x174/0x7e0
02-10 01:10:26  [   83.663932] [<ffffffff8026974c>] kmem_cache_alloc+0x154/0x210
02-10 01:10:26  [   83.682371] [<ffffffff8016d260>] copy_process.part.53+0x760/0xea0
02-10 01:10:26  [   83.701153] [<ffffffff8016dae0>] do_fork+0xa8/0x2c8
02-10 01:10:26  [   83.718719] [<ffffffff80159a94>] handle_sys+0x134/0x160
02-10 01:10:26  [   83.736630]
02-10 01:10:26  [   83.750804]
02-10 01:10:26  Code: 8e640018  0044202b  2c8a0001 <000a0336> 10800023  26c4ffff  12c0006c  0080902d  0809a668
02-10 01:10:26  [   83.900378] [sched_delayed] sched: RT throttling activated
02-10 01:10:26  [   83.918593] ---[ end trace a91730027ab35f25 ]---
02-10 01:10:28  [   83.937180]  Fatal exception: panic in 5 seconds[   85.896096] EDAC MC0: 1 CE DIMM 1  rank 0 bank 4 row 63553 col 1436 on any memory ( page:0x0 offset:0x0  grain:0 syndrome:0x0)
02-10 01:10:28  [   85.920145]  EDAC MC0: 1 UE DIMM 1 rank 0 bank 4 row 63553 col 1436 on any memory (  page:0x0 offset:0x0 grain:0)
02-10 01:10:31 
02-10 01:10:31  [   88.965613] Kernel panic - not syncing: Fatal exception
02-10 01:10:31  [   88.997694] reboot_helper: stored panic_counter = 1
02-10 01:10:31  [   89.005599] [sched_delayed] process 1366 (TICK) no longer affine to cpu0
02-10 01:10:31  [   89.015598] [sched_delayed] process 1520 (shl1) no longer affine to cpu0
02-10 01:10:31  [   89.054033] Extra info: CPLD: 10=05 12=3e 92=00 e0=00
02-10 01:10:31  [   89.065598] [sched_delayed] process 1521 (shl1) no longer affine to cpu0
02-10 01:10:31  [   89.091240] reboot_helper: isam_reboot_type='warm'
02-10 01:10:31  [   89.108715] reboot-helper: Enabling preserved ram
02-10 01:10:31  [   89.126103] flush l2 cache.
02-10 01:10:31  [   89.141690] reboot_helper: continuing standard linux reboot
02-10 01:10:37  [   89.159957] Rebooting in 5 seconds..

2. 内核相关代码

首先要参考笔记 octeon中断
根据现场Cause : 00800034, 结合前面的笔记
我们知道出现了13号trap异常, 触发中断向量handle_tr, 进而调用do_tr()

do_tr()出自下面
arch/mips/kernel/traps.c
异常打印出自

do_tr(struct pt_regs *regs)
    do_trap_or_bp(struct pt_regs *regs, unsigned int code, const char *str)
        switch (code)
        case BRK_BUG:
            //如果是kernel模式下, die;
            die_if_kernel("Kernel bug detected", regs);
                if (unlikely(!user_mode(regs)))
                    die(str, regs);
                        oops_enter();
                        printk("%s[#%d]:\n", str, ++die_counter);
                        show_registers(regs);
                            __show_regs(regs);
                            print_modules();
                            show_stacktrace(current, regs);
                            show_code((unsigned int __user *) regs->cp0_epc);
                        add_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE);
                        oops_exit();
                        if (in_interrupt())
                            panic("Fatal exception in interrupt");
                        if (panic_on_oops)
                            printk(KERN_EMERG "Fatal exception: panic in 5 seconds");
                            ssleep(5);
                            panic("Fatal exception");
            //能到这里说明是user代码出了问题, 发SIGTRAP信号
            force_sig(SIGTRAP, current);
                force_sig_info(sig, SEND_SIG_PRIV, p);

results matching ""

    No results matching ""