http://blog.httrack.com/blog/2013/08/23/catching-posix-signals-on-android/

To Report Or Not To Report

You have a nice application available on the Google Android Store and, as a developer, you have access to nice features giving you basic statistics (the number of downloads, Android version breakdown, etc.), reviews, and a Crashes & ANRs section allowing to audit user crash reports (and Application hangs – that’s the ANR thing).

This is a rather basic feature (you do not have any details on the user’s phone, Android version, etc. – only a Java stack trace extract), but at least it allows you to quickly spot mistakes (such as NullPointerException, or in this case, a NumberFormatException), and the report is pretty straightforward for users (they only have click on the “Report” button in case of crash)

But what if your application is using native code (through JNI) ? In such case, the application will just crash silently, giving the user no opportunity to report the bug to the upstream developer (you), which is not cool (because the bug will remain unspotted, unless users are nice enough to email you, and have the know-how to provide you useful technical details, such as the address of the crash, which is kind of rare)

Catching the Problem

A first step is obviously to be able to detect common native crashes (SIGSEGV, SIGBUS, etc.) using signal handlers. On POSIX systems, this can be achieved by using sigaction():

1
2
3
4
5
6
7
8
9
struct sigaction sa;
struct sigaction sa_old;
memset(&sa, 0, sizeof(sa));
sigemptyset(&sa.sa_mask);
sa.sa_sigaction = my_handler;
sa.sa_flags = SA_SIGINFO;
if (sigaction(sig, &sa, &sa_old) == 0) {
...
}
  • First problem: we are typically running on small systems (Android …), and one source of error is the stack overflow. When the stack is full (too much recursion, too large objects on stack), you hit the last guard page, and the system will raise the SIGSEGV signal handler, running by default on the… same full stack, raising one more time the signal. Fortunately, you may register in any thread an alternative stack through the use of sigaltstack, which basically reserve some space in case of emergency (ie. the system will switch the stack pointer to this one in case of trouble, letting you handler run on a “fresh” stack).
1
2
3
4
5
6
7
8
9
10
stack_t stack;
memset(&stack, 0, sizeof(stack));
/* Reserver the system default stack size. We don't need that much by the way. */
stack.ss_size = SIGSTKSZ;
stack.ss_sp = malloc(stack.ss_size);
stack.ss_flags = 0;
/* Install alternate stack size. Be sure the memory region is valid until you revert it. */
if (stack.ss_sp != NULL && sigaltstack(&stack, NULL) == 0) {
...
}
  • Second problem: we’re hosted on a Java Virtual Machine, and some of these signals might already be caught. Typically, SIGSEGV might be regularly raised to address NullPointerException or as normal JIT processing (ie. executable pages might be flagged with a “no access” protection, and filled by the JIT compiler through a signal handler) – you have to make sure the original signal handler is called first, before messing up with it. If the signal was not processed, the original signal handler will generally return, or will call abort() (which is nice, because we have a last chance to catch it through a SIGABRT handler)
1
2
3
4
5
static void my_handler(const int code, siginfo_t *const si, void *const sc) {
/* Call previous handler. */
old_handler.sa_sigaction(code, si, sc);
...
}
  • Third problem: we’re running on a multi-threaded process, and ideally we do not want to catch crashes from threads we do not own. We can address this issue by using pthread_getspecific() to have a thread-specific context. Well, this is actually a dirty solution: pthread_getspecific() is not an async-signal-safe function, which means that if you are using it on a signal handler, you may have to prepare for unforeseen consequences. (I fail to see what could go wrong with this specific function, however – this is just a peek in a thread-specific address array. But yes, yes, we’re playing with fire, don’t kick me!)
1
2
3
4
5
6
7
8
9
10
11
static void my_handler(const int code, siginfo_t *const si, void *const sc) {
/* Call previous handler. */
old_handler.sa_sigaction(code, si, sc); /* Get thread-specific context. */
my_struct *s = (my_struct*) pthread_getspecific(my_thread_var);
if (s != NULL) {
...
}
...
}
  • Fourth problem: we have to collect some basic information on the crash, especially the faulting address. Fortunately, the third argument of the sigaction callback is a pointer to a ucontext_t context collecting register values (and various other processor-specific details). On x86-64 architectures, the program counter will typically be saved in uc_mcontext.gregs[REG_RIP] ; on ARM, uc_mcontext.arm_pc. Unfortunately, on Android, the ucontext_t structure is not defined in any system headers, and you’ll have to import one by yourself (I shamelessly copied the one from Richard Quirk). You also have to find out what was the binary where the program counter was actually running, to find out this code base address in memory, because a randomized address is not very useful for audit and debugging. The Linux-specific dladdr() function is fortunately giving you this information, with useful other ones (namely the nearest symbol matching the address, and the module base address, to compute a relative offset address). (Note: you can also get this information on Linux by snooping in /proc/self/maps, and checking the address ranges – it will at least provide you the base address)
1
2
3
4
5
6
7
Dl_info info;
if (dladdr(addr, &info) != 0 && info.dli_fname != NULL) {
void * const nearest = info.dli_saddr;
const uintptr_t addr_relative =
((uintptr_t) addr - (uintptr_t) info.dli_fbase);
...
}

You have also the opportunity to catch a backtrace, with the same information, as long as you have a recent (ie. 4.1.1 or higher) Android version, using libcorkscrew library features. This library is not available on older Android releases, and besides, we do not want to get a backtrace of the current stack, but a backtrace of the stack provided in the crash context. Fortunately, we can dynamically load the libcorkscrew.so library to solve the first issue (using dlopen and dlsym()) and, for the second issue, import manually a nice function called unwind_backtrace_signal_arch, which does exactly what we want:

1
2
3
4
5
6
7
8
9
10
11
12
/*
* Describes a single frame of a backtrace.
*/
typedef struct {
uintptr_t absolute_pc; /* absolute PC offset */
uintptr_t stack_top; /* top of stack for this frame */
size_t stack_size; /* size of this stack frame */
} backtrace_frame_t; ssize_t unwind_backtrace_signal_arch(siginfo_t* siginfo, void* sigcontext,
const map_info_t* map_info_list,
backtrace_frame_t* backtrace, size_t ignore_depth, size_t max_depth);

(We also need to import acquire_my_map_info_list())

We can also, when corkscrew is there, use the advanced get_backtrace_symbols function to resolve symbols and demangle them:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
/*
* Describes the symbols associated with a backtrace frame.
*/
typedef struct {
uintptr_t relative_pc; /* relative frame PC offset from the start of the library, or the absolute PC if the library is unknown */
uintptr_t relative_symbol_addr; /* relative offset of the symbol from the start of the library or 0 if the library is unknown */
char* map_name; /* executable or library name, or NULL if unknown */
char* symbol_name; /* symbol name, or NULL if unknown */
char* demangled_name; /* demangled symbol name, or NULL if unknown */
} backtrace_symbol_t; /*
* Gets the symbols for each frame of a backtrace.
* The symbols array must be big enough to hold one symbol record per frame.
* The symbols must later be freed using free_backtrace_symbols.
*/
void get_backtrace_symbols(const backtrace_frame_t* backtrace, size_t frames,
backtrace_symbol_t* backtrace_symbols);
  • Fifth problem: you need to pass all these useful information back to the Java Virtual Machine, and not only by calling directly some kind of callback, because you need to propagate a clean RuntimeException and unwind all Java frames, to have your final exception being reported through the Android framework. The only way to achieve that is by storing the exit point in your own code, using setjmp (actually sigsetjmp, because we’ll need to restore some masked signals), and using it in your signal handler to directly jump at the correct location. The sigsetjmp/siglongjmp functions are obviously not async-signal-safe (see the remark on Application Usage of sigaction()), so this is a highly risky bet. Typically, if the crash happened in the middle of a malloc() call (because, say, the linked list of free blocks has been corrupted), you may find yourself triggering another SIGSEGV (which is the lesser of the evils) or worse, deadlocked, which is rather embarrassing because the user will have to find a way to kill the application by himself. For this reason, an alarm() call will be the first operation executed in case of emergency (and yes, alarm() is async-signal-safe – we can safely kill ourselves).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
static void my_handler(const int code, siginfo_t *const si, void *const sc) {
/* Call previous handler. */
old_handler.sa_sigaction(code, si, sc); /* Trigger a time bomb. */
(void) alarm(30); /* Get thread-specific context. */
my_struct *s = (my_struct*) pthread_getspecific(my_thread_var);
if (s != NULL) {
/* Store crash context for later. */
s->code = code;
s->si = *si;
s->uc = *(ucontext_t*) sc; /* Jump back to initial location. */
siglongjmp(t->ctx, -1);
}
...
}

Is It Working ?

Yes, and it’s much nicer to have SIGSEGV advertised through a clean stack:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
FATAL EXCEPTION: AsyncTask #5
java.lang.RuntimeException: An error occured while executing doInBackground()
at android.os.AsyncTask$3.done(AsyncTask.java:299)
at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:352)
at java.util.concurrent.FutureTask.setException(FutureTask.java:219)
at java.util.concurrent.FutureTask.run(FutureTask.java:239)
at android.os.AsyncTask$SerialExecutor$1.run(AsyncTask.java:230)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1080)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:573)
at java.lang.Thread.run(Thread.java:841)
Caused by: java.lang.Error: signal 11 (Address not mapped to object) at address 0x42 [at libhttrack.so:0xa024]
at com.httrack.android.jni.HTTrackLib.main(Native Method)
at com.httrack.android.HTTrackActivity$Runner.runInternal(HTTrackActivity.java:998)
at com.httrack.android.HTTrackActivity$Runner.doInBackground(HTTrackActivity.java:919)
at com.httrack.android.HTTrackActivity$Runner.doInBackground(HTTrackActivity.java:1)
at android.os.AsyncTask$2.call(AsyncTask.java:287)
at java.util.concurrent.FutureTask.run(FutureTask.java:234)
... 4 more
Caused by: java.lang.Error: signal 11 (Address not mapped to object) at address 0x42 [at libhttrack.so:0xa024]
at data.app_lib.com_httrack_android_2.libhttrack_so.0xa024(Native Method)
at data.app_lib.com_httrack_android_2.libhttrack_so.0x705fc(hts_main2:0x8f74:0)
at data.app_lib.com_httrack_android_2.libhtslibjni_so.0x4cc8(HTTrackLib_main:0xf8:0)
at data.app_lib.com_httrack_android_2.libhtslibjni_so.0x52d8(Java_com_httrack_android_jni_HTTrackLib_main:0x64:0)
at system.lib.libdvm_so.0x1dc4c(dvmPlatformInvoke:0x70:0)
at system.lib.libdvm_so.0x4dcab(dvmCallJNIMethod(unsigned int const*, JValue*, Method const*, Thread*):0x18a:0)
at system.lib.libdvm_so.0x385e1(dvmCheckCallJNIMethod(unsigned int const*, JValue*, Method const*, Thread*):0x8:0)
at system.lib.libdvm_so.0x4f699(dvmResolveNativeMethod(unsigned int const*, JValue*, Method const*, Thread*):0xb8:0)
at system.lib.libdvm_so.0x27060(Native Method)
at system.lib.libdvm_so.0x2b580(dvmInterpret(Thread*, Method const*, JValue*):0xb8:0)
at system.lib.libdvm_so.0x5fcbd(dvmCallMethodV(Thread*, Method const*, Object*, bool, JValue*, std::__va_list):0x124:0)
at system.lib.libdvm_so.0x5fce7(dvmCallMethod(Thread*, Method const*, Object*, JValue*, ...):0x14:0)
at system.lib.libdvm_so.0x54a6f(Native Method)
at system.lib.libc_so.0xca58(__thread_entry:0x48:0)
at system.lib.libc_so.0xcbd4(pthread_create:0xd0:0)

Bonus: produce a relative address, and get the filename and line number for FREE!

In the above stack, we know that the crash occurred somewhere inside the hts_main2 function, which is not very precise. We have actually another very useful information: the crash was spotted inside libhttrack.so, at the relative address 0xa024. This is a relative address, computed earlier with dladdr(), which means that you can find out exactly the source location if you kept some debugging information. Most people do not want them, because it increases the binary size by a unreasonable factor (especially when running on small embedded devices with 3G+ connectivity priced above gold ingots levels), and thus either strip them silently, or keep another “debug” build.

You have an extremely simple alternative way: build once your libraries with all debugging symbols, including line numbers and macro information (-g3), and instead of stripping them, split the debugging sections on a separate file (say, a .dbg file). To let various tools such as gdb or addr2line behave gently, you have a way to “tell” them that the .so actually has a debug .dbg related file, through the .gnu_debuglink ELF section.

Here’s what you typically need to do to split your library into a stripped version plus a debug symbol file:

1
2
3
4
5
6
7
8
# copy all debugging sections to dbg file
objcopy --only-keep-debug mylib.so mylib.dbg
# strip debug sections
objcopy --strip-debug mylib.so
# wipe any existing ELF .gnu_debuglink section if any
objcopy --remove-section .gnu_debuglink mylib.so
# set the .gnu_debuglink to the dbg file
objcopy --add-gnu-debuglink=mylib.dbg mylib.so

The nice .dbg file can then be kept for debugging purpose:

1
2
3
4
cd /build-archives/httrack/armv7/3.47.99.35
./toolchains/arm-linux-androideabi-4.7/prebuilt/linux-x86_64/bin/arm-linux-androideabi-addr2line -C -f -e libhttrack.so 0xa024
fourty_two
src/htscoremain.c:111

Okay, but Is It Really Safe ?

A typical crash may have a great variety of causes (Captain Obvious to the rescue!). My own experience, though, shows that NULL pointer dereferencing, dangling pointers, and other isolated crash spots due to bad code logic are a very common cause of crashes. Yes, you will still have troubles when dealing with corrupted allocators, or when breaking in the middle of a async-signal-safe call (leaving mutexes locked, and more generally a dirty state that will hit you back later), but the alternate solution is to die immediately (the default behavior), so trying to do gentle emergency steps can not do any harm.

I Want To Test It!

You can check out the “CoffeeCatch” library code on GitHub – you can either merge the .c file in your project(s), or build it as a standalone library ; this tiny library has no exotic external dependencies. Make sure all your libraries are built with the -funwind-tables compiler flag to produce frame unwind information for all functions (add if necessary LOCAL_CFLAGS := -funwind-tables to your Android.mk file). Note that -funwind-tables does not produce significant data size overhead in normal situations – so rest well my friend.

The use is pretty straightforward – for JNI code, a simple macro COFFEE_TRY_JNI, taking the JNIEnv environment pointer as a first argument, and a code block as a second argument, will do the trick. For code outside JNI, you have to enclose protected code with COFFEE_TRY()/COFFEE_CATCH()/COFFEE_END() as in the example below.

In all cases, make sure the block is enclosed in a dedicated function without any local variables lying around (because the saved context does not include registers, AFAIK, especially volatile registers, which might be wiped before calling sigsetjmp, and whose saved values are in unknown location in the stack).

The JNI flavor:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/** The potentially dangerous function. **/
jint call_dangerous_function(JNIEnv* env, jobject object) {
// ... do dangerous things!
return 42;
} /** Protected function stub. **/
void foo_protected(JNIEnv* env, jobject object, jint *retcode) {
/* Try to call 'call_dangerous_function', and raise proper Java Error upon
* fatal error (SEGV, etc.). **/
COFFEE_TRY_JNI(env, *retcode = call_dangerous_function(env, object));
} /** Regular JNI entry point. **/
jint Java_com_example_android_MyNative_foo(JNIEnv* env, jobject object) {
jint retcode = 0;
foo_protected(env, object, &retcode);
return retcode;
}

The standard flavor:

1
2
3
4
5
6
7
8
9
10
static __attribute__ ((noinline)) void demo(int *fault) {
COFFEE_TRY() {
recurse_madness(42);
*fault = 0;
} COFFEE_CATCH() {
const char*const message = coffeecatch_get_message();
snprintf(string_buffer, sizeof(string_buffer), "%s", message);
*fault = 1;
} COFFEE_END();
}

TL;DR: so far, no native crash has been reported on Android for HTTrack. But I’m ready to collect them in case of need :)

Replacing JNI Crashes by Exceptions on Android的更多相关文章

  1. cocos2d-x 通过JNI实现c/c++和Android的java层函数互调

    文章摘要: 本文主要实现两个功能: (1)通过Android sdk的API得到应用程序的包名(PackageName),然后传递给c++层函数. (2)通过c++函数调用Android的java层函 ...

  2. JNI 开发基础篇:Android开发中os文件的探索

    正题: android开发中,时长会遇到os文件的使用,那么os文件到底是什么?在这篇文章中会进行说明. .os文件在android中意味着C语言书写的方法,经android提供的ndk进行编译,从而 ...

  3. Android Jni开发,报com.android.ide.common.process.ProcessException: Error configuring 错误解决方案

    今天在练习JNI项目时,Android studio版本为:3.1.3,Gradle版本为4.4.由于Android studio 3.X弃用了 android.useDeprecatedNdk=tr ...

  4. Android JNI入门第四篇——Android.mk文件分析

    ndroid.mk文件是在使用NDK编译C代码时必须的文件,Android.mk文件中描述了哪些C文件将被编译且指明了如何编译.掌握Android.mk文件的编写主要是掌握其里头将要使用的一些关键字, ...

  5. 深入理解JNI(《深入理解android》(author : 邓凡平)读书札记)

    JNI的技术特点: java能够调用native代码. native代码能够调用java代码.   JNI的技术考虑: 实现java代码的平台无关型. java语言发展初期使用C和C++代码,避免重复 ...

  6. NDK(5) Android JNI官方综合教程[JavaVM and JNIEnv,Threads ,jclass, jmethodID, and jfieldID,UTF-8 and UTF-16 Strings,Exceptions,Native Libraries等等]

    JNI Tips In this document JavaVM and JNIEnv Threads jclass, jmethodID, and jfieldID Local and Global ...

  7. android 官方文档 JNI TIPS

    文章地址  http://developer.android.com/training/articles/perf-jni.html JNI Tips JNI is the Java Native I ...

  8. Android JNI的Android.mk文件语法详解

    Android.mk简介: Android.mk文件用来告知NDK Build 系统关于Source的信息. Android.mk将是GNU Makefile的一部分,且将被Build System解 ...

  9. Android 性能优化(18)JNI优化:JNI Tips 提升性能技巧

    JNI Tips 1.In this document JavaVM and JNIEnv Threads jclass, jmethodID, and jfieldID Local and Glob ...

随机推荐

  1. Oracle RAC备份异机单实例恢复演练

    本文只节选了操作方案的部分章节: 3.   操作步骤 3.1. 异机单实例Oracle数据库软件安装 在异机上进行单实例Oracle数据库软件安装.该步骤过程不再本文中重复描述,如果对安装过程存在疑问 ...

  2. E8.NET工作流平台如何与其他软件系统集成?

    1.与邮件系统集成 E8.Net工作流开发架构已经提供了与电子邮件系统集成的模块,可以轻松实现与EXCHANGE等专业邮件系统集成的应用需求. 2.与短信系统集成 E8.Net工作流架构已经提供了手机 ...

  3. 实现n皇后问题(回溯法)

    /*======================================== 功能:实现n皇后问题,这里实现4皇后问题 算法:回溯法 ============================= ...

  4. 敏捷开发的特点(转自MBAlib)

    敏捷开发的特点 敏捷方法主要有两个特点,这也是其区别于其他方法,尤其是重型方法的最主要特征: (1)敏捷开发方法是“适应性”(Adaptive)而非“预设性” (Predictive). 这里说的预设 ...

  5. UIWebView 加载网页、文件、 html-b

    UIWebView  是用来加载加载网页数据的一个框.UIWebView可以用来加载pdf word doc 等等文件 生成webview 有两种方法,1.通过storyboard 拖拽 2.通过al ...

  6. python 读取SQLServer数据插入到MongoDB数据库中

    # -*- coding: utf-8 -*-import pyodbcimport osimport csvimport pymongofrom pymongo import ASCENDING, ...

  7. pyshp操作shapefile

    ESRI的shp文件自1998发布技术文档以来,shp作为GIS文件的基本交换文件广为使用. 工作中使用shp文件的机会比较多,pyshp是Python操作shapefile的包. 先来说shp文件的 ...

  8. JavaScript闭包底层解析

    1. 闭包是一个函数,这个函数有权访问另一个函数作用域中的变量,创建闭包最常见的方式,就是在函数内部创建函数.要想彻底搞清其中细节,必须从函数从创建到调用的时候都发生了什么入手 2. 函数第一次被调用 ...

  9. Unity3d 调用C++的DLL

    原地址:http://www.cnblogs.com/alongu3d/archive/2013/04/20/3031904.html Unity 3D 调用DLL的方法 本文转载:渡蓝的博客园 ht ...

  10. LINUX Shell 下求两个文件交集和差集的办法

    http://blog.csdn.net/autofei/article/details/6579320 假设两个文件FILE1和FILE2用集合A和B表示,FILE1内容如下: a b c e d ...