Tuesday, December 13, 2011

Linux Kernel oops


資料來源: Linux Kernel oops

每次kernel掛掉總是會出現一堆16進制的東西
但我都沒有認真看過
因為….真的看不懂
不過上星期上完課之後
終於知道這些數字代表什麼
例如main.c裡面有個faulty_write function, 故意讓0的位置寫0
ssize_t faulty_write (struct file *filp, const char __user *buf, size_t count,
loff_t *pos)
{
/* make a simple fault by dereferencing a NULL pointer */
*(int *)0 = 0;
return 0;
}
當我去執行的時候kernel掛掉
$ echo “1″ > /dev/aaa
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c3a70000
[00000000] *pgd=33a7e031, *pte=00000000, *ppte=00000000
Internal error: Oops: 817 [#3]
Modules linked in: faulty
CPU: 0 Tainted: G D (2.6.29.2 #2)
PC is at faulty_write+0×10/0×18 [faulty]
LR is at vfs_write+0xc4/0×148
pc : [<bf00009c>] lr : [<c00a5f6c>] psr: a0000013
sp : c3a69f44 ip : c3a69f54 fp : c3a69f50
r10: 400d0948 r9 : c3a68000 r8 : 00000000
r7 : 00000002 r6 : c3a69f78 r5 : 400d3704 r4 : c382cea0
r3 : c3a69f78 r2 : 00000002 r1 : 400d3704 r0 : 00000000
Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 0000717f Table: 33a70000 DAC: 00000015
Process echo (pid: 882, stack limit = 0xc3a68260)
Stack: (0xc3a69f44 to 0xc3a6a000)
9f40: c3a69f74 c3a69f54 c00a5f6c bf00009c 00000000 c382cec0 c382cea0
9f60: c3a69f78 00000000 c3a69fa4 c3a69f78 c00a60b0 c00a5eb8 00000000 00000000
9f80: 00000000 400d1188 00000002 400d3704 00000004 c0035fa4 00000000 c3a69fa8
9fa0: c0035e00 c00a6074 400d1188 00000002 00000001 400d3704 00000002 4009f9b4
9fc0: 400d1188 00000002 400d3704 00000002 000a8374 00000002 400d0948 00000000
9fe0: 400a491c bec33d74 4009f008 4008accc 20000010 00000001 fffff7f7 fff5ffdf
Backtrace:
[<bf00008c>] (faulty_write+0×0/0×18 [faulty]) from [<c00a5f6c>] (vfs_write+0xc4/0×148)
[<c00a5ea8>] (vfs_write+0×0/0×148) from [<c00a60b0>] (sys_write+0×4c/0×74)
r7:00000000 r6:c3a69f78 r5:c382cea0 r4:c382cec0
[<c00a6064>] (sys_write+0×0/0×74) from [<c0035e00>] (ret_fast_syscall+0×0/0×2c)
r8:c0035fa4 r7:00000004 r6:400d3704 r5:00000002 r4:400d1188
Code: e1a0c00d e92dd800 e24cb004 e3a00000 (e5800000) 
—[ end trace 0f24d785ce6e74e6 ]—
Segmentation fault
/ $

注意看裡面
PC is at faulty_write+0×10/0×18 [faulty]
Code: e1a0c00d e92dd800 e24cb004 e3a00000 (e5800000)

因為instruction length是4 byte
faulty_write+0×10/0×18 = > 4 / 6 (6行的第四行錯)
在位置e3a00000 的地方

執行mipsel-linux-objdump -S main.o 可以看到組語和程式碼的對應
ssize_t faulty_write (struct file *filp, const char __user *buf, size_t count,
loff_t *pos)
{
8c:     e1a0c00d mov ip, sp
90:     e92dd800 stmdb sp!, {fp, ip, lr, pc}
94:     e24cb004 sub fp, ip, #4 ; 0×4
/* make a simple fault by dereferencing a NULL pointer */
*(int *)0 = 0;
98:     e3a00000 mov r0, #0 ; 0×0
9c:     e5800000 str r0, [r0]
return 0;
}

所以這樣就可以找到哪一個錯誤!

No comments: