Saturday, June 27, 2015

如何在 Android 各 level ( 包含 user space 與 kernel space ) 使用dump call stack的方法

資料來源: 如何在 Android 各 level ( 包含 user space 與 kernel space ) 使用dump call stack的方法

 

dump call stack


[文章重點]

了解 Android 各 level ( UI, framework 與 HAL) 與 kernel 間, 如何印出 call stack, 方便追 code 與 debug

[文章目錄]
  1. kernel call stack
  2. Android Java layer
  3. Android framework ( written by c++)
  4. Android HAL ( written by c )
  5. Call Stack 沒有出現 function name

kernel call stack

如果想知道call stack,也就是說, 我們想知道是誰call到func_foo(). 此時,我們可以利用 dump_stack(),放在你想dump back trace的地方就OK囉.
 
void func_foo(void){
 
  int a=3;
  ...
  
  dump_stack();

  ...

}

Java layer call stack
在Java檔案, 可以使用下述方法得到dump call stack


public void foo(boolean state, int flags) {
 ...
 Log.d(TAG,"xxxx", new Throwable());
 ...
}


C++ layer call stack

在C/C++ 檔案, Android 已經有寫了frameworks/native/libs/utils/CallStack.cpp 供我們使用


#include <utils/CallStack.h>
...
void foo(void) {
...
   android::CallStack stack;
   stack.update();
   stack.dump("XXX");

...
}


如果你所使用是Android 4.4 之後
請改用


#include <utils/CallStack.h>
...
void foo(void) {
...
   android::CallStack stack;
   stack.update( );
   stack.log("XXX");

...
}

在Android.mk 記得要加


LOCAL_SHARED_LIBRARIES += libutils


C layer call stack

由於C去call C++需要做一些宣告, 所以將它獨立出來方便使用(dump_stack.cpp與 dump_stack.h)


dump_stack.h

#ifdef __cplusplus
extern "C" {
#endif

 void dump_stack_android(void);
 
#ifdef __cplusplus
}
#endif


dump_stack.cpp


#include "dump_stack.h"
#include <utils/CallStack.h>

using namespace android;
extern "C"{
 void dump_stack_android(void)
 {
CallStack stack;
stack.update();
stack.dump("XXX");
 }
}


如果你所使用是Android 4.4 之後
請改用


#include "dump_stack.h"
#include <utils/CallStack.h>

using namespace android;
extern "C"{
 void dump_stack_android(void)
 {
CallStack stack;
stack.update();
stack.log("XXX");
 }
}


同樣地, Android.mk也需要修改


LOCAL_SRC_FILES := \
        …... \
        dump_stack.cpp

LOCAL_SHARED_LIBRARIES += libutils


接下來在C file中要使用時只要


extern void dump_stack_android();


void function_a()
{
 …
 dump_stack_android();
 …
}


[ Call Stack 沒有出現 function name]
有時我們會發現在C++ 或 C 語言中使用 CallStack , 在 call dump 中並沒有出現 function name


D/XXX (  147): #00  pc 00001b90  /system/lib/hw/audio.primary.mrvl.so (dump_stack_android+19)
D/XXX (  147): #01  pc 00004b56  /system/lib/hw/audio.primary.mrvl.so
D/XXX (  147): #02  pc 0001f828  /system/lib/libaudioflinger.so
D/XXX (  147): #03  pc 00019138  /system/lib/libaudioflinger.so
D/XXX (  147): #04  pc 00023bb6  /system/lib/libaudioflinger.so
D/XXX (  147): #05  pc 0000e9fe  /system/lib/libutils.so (android::Thread::_threadLoop(void*)+213)
D/XXX (  147): #06  pc 0000e530  /system/lib/libutils.so
D/XXX (  147): #07  pc 0000d208  /system/lib/libc.so (__thread_entry+72)
D/XXX (  147): #08  pc 0000d3a4  /system/lib/libc.so (pthread_create+240)

我們追一下 CallStack 是如何被實作
先回顧一下 CallStack 是如何被使用 (以 Android 4.4 為例)
 CallStack stack;  
 stack.update();  
 stack.log();  

先看一下 update( ) function 的定義 ( it is under system/core/include/utils/CallStack.h)
   // Immediately collect the stack traces for the specified thread.  
   void update(int32_t ignoreDepth=1, int32_t maxDepth=MAX_DEPTH, pid_t tid=CURRENT_THREAD);  
所以透過 update( ) function, 我們可以設定想看哪一個 thread 並 dump 出多少層的 call stack, 如果都沒寫, 就是以當前的 thread 去做 call stack dump, update( ) function 會將實際可以 dump 多少的 frame 給抓出來, 其中 frame 的數量記錄在 mCount 變數, 各 frame 的資訊則記錄在 mStack[ ] 裡面, 接下來再透過 log( ) function 把 call stack 裡的 program counter 所記載的記憶體位址去把相對應的 function name 給解析出來.

 log( )  
 |--> print( )  
    |--> get_backtrace_symbols( )  

看一下 get_backtrace_symbols( ) 在做些什麼
void get_backtrace_symbols(const backtrace_frame_t* backtrace, size_t frames,
    backtrace_symbol_t* backtrace_symbols) {

   ... 
for (size_t i = 0; i < frames; i++) {
       ...
           Dl_info info;
           if (dladdr((const void*)frame->absolute_pc, &info) && info.dli_sname) {
            symbol->relative_symbol_addr = (uintptr_t)info.dli_saddr
                    - (uintptr_t)info.dli_fbase;
            symbol->symbol_name = strdup(info.dli_sname);
            symbol->demangled_name =
                       demangle_symbol_name(symbol->symbol_name);
           }
      ...
}
release_my_map_info_list(milist);
}


這是因為它是使用 dladdr() 去讀取該share lib的 dynamic symbol 而獲取 function name

但是如果該 function 是宣告成 static, 該 function name 就不會出現在 dynamic symbol 裡 (你可以使用 arm-linux-androideabi-nm -D xxxx.so | grep the_function_name , 如果沒有出現, 就表示該 funciton name 並不在 dynamic symbol 裡),  遇到這情況就只好使用 add2line 指令去讀 out folder 下的 symbol 了, 各位可以參考我另一篇文章 http://janbarry0914.blogspot.tw/2011/07/android-crash-tombstone.html . 感謝.

 

Tuesday, June 23, 2015

當Android crash 該如何有效利用tombstone

資料來: 當Android crash 該如何有效利用tombstone

當Android發生crash時會產生一個類似core dump的檔案在 /data/tombstones/tombstone_XX where XX is a number increased by one with each crash. 我們要如何使用該檔案呢? 
tombstones_XX 檔案內容如下, 我們可以利用addr2line將pc所指的位置的function name給找出來
pid: 153, tid: 161  >>> system_server <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 43bf3000
 r0 003dc4a0  r1 43bf3000  r2 00000400  r3 00000000
 r4 002cbc98  r5 002cbc98  r6 00000000  r7 00000000
 r8 00000001  r9 00000140  10 00000019  fp 00301e04
 ip 8090a1e0  sp 43f1eb68  lr 80c92cbb  pc afd0cdfc  cpsr 20000010
 d0  003b3a700033c9f8  d1  00301280002eb23b
 d2  000000000000003b  d3  0000000000000000
 d4  43bbc00043bb8000  d5  43d4800043bb8000
 d6  3f80000000000000  d7  000001e03f000000
 d8  0000000000000000  d9  0000000000000000
 d10 0000000000000000  d11 0000000000000000
 d12 0000000000000000  d13 0000000000000000
 d14 0000000000000000  d15 0000000000000000
 d16 3effff003effff00  d17 3f80000041808889
 d18 3f80000041c4cccd  d19 0701070100700798
 d20 0000000000000c27  d21 0000043f00890000
 d22 0000000000000000  d23 0000000000000008
 d24 3fc74721cad6b0ed  d25 3fc39a09d078c69f
 d26 0000000000000000  d27 0000000000000000
 d28 0000000000000000  d29 0000000000000000
 d30 0000000000000000  d31 0000000000000000
 scr 20000010

         #00  pc 0000cdfc  /system/lib/libc.so
         #01  pc 00092cb8  /system/lib/egl/libGLESv2_adreno200.so
         #02  pc 00093814  /system/lib/egl/libGLESv2_adreno200.so
         #03  pc 00093890  /system/lib/egl/libGLESv2_adreno200.so
         #04  pc 000938bc  /system/lib/egl/libGLESv2_adreno200.so
         #05  pc 00095cce  /system/lib/egl/libGLESv2_adreno200.so
         #06  pc 0006526a  /system/lib/egl/libGLESv2_adreno200.so
         #07  pc 000655cc  /system/lib/egl/libGLESv2_adreno200.so
         #08  pc 000188c4  /system/lib/egl/libGLESv1_CM_adreno200.so
         #09  pc 000265ca  /system/lib/libsurfaceflinger.so
         #10  pc 0001b04c  /system/lib/libsurfaceflinger.so
         #11  pc 0001bda8  /system/lib/libsurfaceflinger.so
         #12  pc 0001bf52  /system/lib/libsurfaceflinger.so
         #13  pc 00020f3c  /system/lib/libsurfaceflinger.so
         #14  pc 00023df6  /system/lib/libsurfaceflinger.so
         #15  pc 000259de  /system/lib/libsurfaceflinger.so
         #16  pc 0001c52c  /system/lib/libutils.so
         #17  pc 0001ca8a  /system/lib/libutils.so
         #18  pc 00011bc4  /system/lib/libc.so
         #19  pc 00011790  /system/lib/libc.so
1. Android 產生出來的還沒進行strip的執行檔或shared libraries 是放在$android_root/out/target/product/YOUR_PRODUCT_NAME/symbols/system/bin 與 $android_root/out/target/product/YOUR_PRODUCT_NAME/symbols/system/lib目 錄下
2. Android所使用的toolchain是放在$android_root/prebuilt/linux-x86/toolchain/arm-eabi-4.4.3/bin/ 下面
3. 假設我們要看 #09  pc 000265ca  /system/lib/libsurfaceflinger.so 是 call 到哪一個function
 (假設tombstone_XX 是在 $android_root 目錄下)
    $ ./prebuilt/linux-x86/toolchain/arm-eabi-4.4.3/bin/arm-eabi-addr2line -f -e ./out/target/product/YOUR_PRODUCT_NAME/symbols/system/lib/libsurfaceflinger.so  0x000265ca  
  它會show出_ZN7android14TextureManager12initEglImageEPNS_5ImageEPvRKNS_2spINS_13GraphicBufferEEE
  $android_root/frameworks/base/services/surfaceflinger/TextureManager.cpp:164 這樣的訊息

Monday, June 01, 2015

How do you read a segfault kernel log message

Reference: How do you read a segfault kernel log message

This can be a very simple question, I'm am attempting to debug an application which generates the following segfault error in the kern.log

kernel: myapp[15514]: segfault at 794ef0 ip 080513b sp 794ef0 error 6 in myapp[8048000+24000]

Here are my questions:

Is there any documentation as to what are the diff error numbers on segfault, in this instance it is error 6, but i've seen error 4, 5

What is the meaning of the information at bf794ef0 ip 0805130b sp bf794ef0 and myapp[8048000+24000]?

So far i was able to compile with symbols, and when i do a x 0x8048000+24000 it returns a symbol, is that the correct way of doing it? My assumptions thus far are the following:

sp = stack pointer?
ip = instruction pointer
at = ????
myapp[8048000+24000] = address of symbol?




Based on my limited knowledge, your assumptions are correct.

sp = stack pointer
ip = instruction pointer
myapp[8048000+24000] = address

If I were debugging the problem I would modify the code to produce a core dump or log a stack backtrace on the crash. You might also run the program under (or attach) GDB.

The error code is just the architectural error code for page faults and seems to be architecture specific. They are often documented in arch/*/mm/fault.c in the kernel source. My copy of Linux/arch/i386/mm/fault.c has the following definition for error_code:

bit 0 == 0 means no page found, 1 means protection fault
bit 1 == 0 means read, 1 means write
bit 2 == 0 means kernel, 1 means user-mode

My copy of Linux/arch/x86_64/mm/fault.c adds the following:

bit 3 == 1 means fault was an instruction fetch


Answer:

When the report points to a program, not a shared library

Run addr2line -e myapp 080513b (and repeat for the other instruction pointer values given) to see where the error is happening. Better, get a debug-instrumented build, and reproduce the problem under a debugger such as gdb.
If it's a shared library

In the libfoo.so[NNNNNN+YYYY] part, the NNNNNN is where the library was loaded. Subtract this from the instruction pointer (ip) and you'll get the offset into the .so of the offending instruction. Then you can use objdump -DCgl libfoo.so and search for the instruction at that offset. You should easily be able to figure out which function it is from the asm labels. If the .so doesn't have optimizations you can also try using addr2line -e libfoo.so .
What the error means

Here's the breakdown of the fields:

address - the location in memory the code is trying to access (it's likely that 10 and 11 are offsets from a pointer we expect to be set to a valid value but which is instead pointing to 0)
ip - instruction pointer, ie. where the code which is trying to do this lives
sp - stack pointer
error - Architecture-specific flags; see arch/*/mm/fault.c for your platform.