Under the Hood: Dalvik patch for Facebook for Android

先来看一段中文内容

Hack Dalvik VM解决Android 2.3 DEX/LinearAllocHdr超限

当安卓工程庞大到一定程度(代码结构渣到一定程度)的时候,就会遇到诸如最大方法数超过限制导致无法安装,Crash等问题。Android 2.3 INSTALL_FAILED_DEXOPT 65535

问题的本质有两个

  • dx 打包时限制了单个dx文件的最大方法数为65535
  • Dalvik VM限制内存中加载的方法数(方法,类定义及构造函数)不能超过65535个

问题的重现很简单

  • 写一个类,把函数复制个6w份,一build,报错
  • apk安装到2.3系统,提示INSTALL_FAIL_DEXOPT
  • 动态加载两个DEX模块,每个函数3w份,一加载运行,程序Crash

网上一般推荐的解决方法

  • 删代码以及jar包,尤其是自动生成的get/set,没用的类,可以使用proguard自动优化掉无用代码
  • 由于高于Gingerbread的版本将LinearAllocHdr分配空间从5M提高到8M,放弃2.3的用户后可以有一定的缓冲时间
  • 使用dex动态加载的方式将程序内的模块插件化,这样会将问题1转化为问题2,如果程序加载项过大时还是会有崩溃现象出现
  • 将java层逻辑移到jni层实现
  • hacking dalvik vm

Facebook曾经遇到了这样的问题,有一个相关博文(Under the Hood: Dalvik patch for Facebook for Android),大概解决方法是发了一个lite版本去掉了一大票功能,以及写了一个小补丁hack掉Android Dalvik VM把它搞大了。。。

hacking dalvik vm的方法似乎是最干净利落的。可惜facebook语焉不详,参照博文中给出的信息,可以找到LinearAllocHdr*指针位于vm/Globals.h

使用jni写了个小程序做了以下几件事情实现了该hacking

  1. 通过jni方法取到*env
  2. 指针往回便利内存查找65535对应内存块
  3. 重新mmap8M内存,替换到len以及Hdr的当前位置到新map的位置

https://github.com/viilaismonster/LinearAllocFix

另外提供一个小工具可以用来反编译apk以统计构建出来的apk内*大约*有多少个方法

小工具里面会依据文件名缓存先前反编译的结果,可以用-diff参数将两个版本apk对比,查看具体到包的方法数变动

 
 
 
下面转自facebook网址:https://www.facebook.com/notes/facebook-engineering/under-the-hood-dalvik-patch-for-facebook-for-android/10151345597798920
David Reiss在2013年3月4日周一下午1:59发表的文章
相关内容链接:https://github.com/aosp-mirror/platform_dalvik/blob/android-2.3.7_r1/vm/Globals.h#L519
以及: https://github.com/aosp-mirror/platform_dalvik/blob/android-2.3.7_r1/vm/LinearAlloc.h#L33
解决方案:https://github.com/viilaismonster/LinearAllocFix
 

Facebook is one of the most feature-rich apps available for Android. With features like push notifications, news feed, and an embedded version of Facebook Messenger (a complete app in its own right) all working together in real-time, the complexity and volume of code creates technical challenges that few, if any, other Android developers face--especially on older versions of the platform. (Our latest apps support Android versions as old as Froyo--Android version 2.2--which is almost three years old.)

One of these challenges is related to the way Android's runtime engine, the Dalvik Virtual Machine, handles Java methods. Late last year we completed a  major rebuildof our Android app (https://www.facebook.com/notes/facebook-engineering/under-the-hood-rebuilding-facebook-for-android/10151189598933920), which involved moving a lot of our code from JavaScript to Java, as well as using newer abstractions that encouraged large numbers of small methods (generally considered a good programming practice). Unfortunately, this caused the number of Java methods in our app to drastically increase.

As we were testing, the problem first showed up as described in this bug (http://code.google.com/p/android/issues/detail?id=22586) , which caused our app installation to fail on older Android phones. During standard installation, a program called "dexopt" runs to prepare your app for the specific phone it's being installed on. Dexopt uses a fixed-size buffer (called the "LinearAlloc" buffer) to store information about all of the methods in your app. Recent versions of Android use an 8 or 16 MB buffer, but Froyo and Gingerbread (versions 2.2 and 2.3) only have 5 MB. Because older versions of Android have a relatively small buffer, our large number of methods was exceeding the buffer size and causing dexopt to crash.

After a bit of panic, we realized that we could work around this problem by breaking our app into multiple dex files, using the technique described here (http://android-developers.blogspot.com/2011/07/custom-class-loading-in-dalvik.html), which focuses on using secondary dex files for extension modules, not core parts of the app.

However, there was no way we could break our app up this way--too many of our classes are accessed directly by the Android framework. Instead, we needed to inject our secondary dex files directly into the system class loader. This isn't normally possible, but we examined the Android source code and used Java reflection to directly modify some of its internal structures. We were certainly glad and grateful that Android is open source—otherwise, this change wouldn’t have been possible.

But as we came closer to launching our redesigned app, we ran into another problem. The LinearAlloc buffer doesn't just exist in dexopt--it exists within every running Android program. While dexopt uses LinearAlloc to to store information about all of the methods in your dex file, the running app only needs it for methods in classes that you are actually using. Unfortunately, we were now using too many methods for Android versions up to Gingerbread, and our app was crashing shortly after startup.

There was no way to work around this with dex files since all of our classes were being loaded into one process, and we weren’t able to find any information about anyone who had faced this problem before (since it is only possible once you are already using multiple dex files, which is a difficult technique in itself).  We were on our own.

We tried various techniques to reclaim space, including aggressive use of ProGuard and source code transformations to reduce our method count. We even built a profiler for LinearAlloc usage to figure out what the biggest consumers were. Nothing we tried had a significant impact, and we still needed to write many more methods to support all of the rich content types in our new and improved news feed and timeline.

As it stood, the release of the much-anticipated Facebook for Android 2.0 was at risk. It seemed like we would have to choose between cutting significant features from the app or only shipping our new version to the newest Android phones (ICS and up). Neither seemed acceptable. We needed a better solution.

Once again, we looked to the Android source code. Looking at the definition of the LinearAlloc buffer (https://github.com/android/platform_dalvik/blob/android-2.3.7_r1/vm/LinearAlloc.h#L33), we realized that if we could only increase that buffer from 5 MB to 8 MB, we would be safe!

That's when we had the idea of using a JNI extension to replace the existing buffer with a larger one. At first, this idea seemed completely insane. Modifying the internals of the Java class loader is one thing, but modifying the internals of the Dalvik VM while it was running our code is incredibly dangerous. But as we pored over the code, analyzing all the uses of LinearAlloc, we began to realize that it should be safe as long as we did it at the start of our program. All we had to do was find the LinearAllocHdr object, lock it, and replace the buffer.

Finding it turned out to be the hard part. Here’s where it’s stored(https://github.com/android/platform_dalvik/blob/android-2.3.7_r1/vm/Globals.h#L519), buried within the DvmGlobals object, over 700 bytes from the start. Searching the entire object would be risky at best, but fortunately, we had an anchor point: the vmList object just a few bytes before. This contained a value that we could compare to the JavaVM pointer available through JNI.

The plan was finally coming together: find the proper value for vmList, scan the DvmGlobals object to find a match, jump a few more bytes to the LinearAlloc header, and replace the buffer. So we built the JNI extension, embedded it in our app, started it up, and...we saw the app running on a Gingerbread phone for the first time in weeks.The plan had worked.

But for some reason it failed on the Samsung Galaxy S II...

The most popular Gingerbread phone...

Of all time...

It seems that Samsung made a small change to Android that was confusing our code. Other manufacturers might have done the same, so we realized we needed to make our code more robust.

Manual inspection of the GSII revealed that the LinearAlloc buffer was only 4 bytes from where we expected it, so we adjusted our code to look a few bytes to each side if it failed to find the LinearAlloc buffer in the expected location. This required us to parse our process's memory map to ensure we didn't make any invalid memory references (which would crash the app immediately) and also build some strong heuristics to make sure we would recognize the LinearAlloc buffer when we found it. As a last resort, we found a (mostly) safe way to scan the entire process heap to search for the buffer.

Now we had a version of the code that worked on a few popular phones--but we needed more than just a few. So we bundled our code up into a test app that would run the same procedure we were using for the Facebook app, then just display a large green or red box, indicating success or failure.

We used manual testing, DeviceAnywhere, and a test lab that Google let us borrow to run our test app on 70 different phone models, and fortunately, it worked on every single one!

We released this code with Facebook for Android 2.0 in December. It's now running on hundreds of different phone models, and we have yet to find one where it doesn't work. The great speed improvements in that release would not have been possible without this crazy hack. And needless to say, without Android’s open platform, we wouldn’t have had the opportunity to ship our best version of the app. There’s a lot of opportunity for building on Android, and we’re excited to keep bringing the Facebook experience to more people and devices.

Android的方法数超过65535问题的更多相关文章

  1. Android工程方法数超过65535的解决办法

    Error:Execution failed for task ':ttt:transformClassesWithDexForDebug'.com.android.build.api.transfo ...

  2. APK方法数超过65535及MultiDex解决方案

    以下参考自官方文档配置方法数超过 64K 的应用 随着 Android 平台的持续成长,Android 应用的大小也在增加.当您的应用及其引用的库达到特定大小时,您会遇到构建错误,指明您的应用已达到 ...

  3. Android工程方法数超过64k,The number of method references in a .dex file cannot exceed 64K.

    最近将一个老的Eclipse项目转到Android Studio后,用gradle添加了几个依赖,项目可以make,但是一旦run就报错 Error:The number of method refe ...

  4. Android为什么方法数不能超过65535

    言归正传,来聊聊为什么方法数不能超过65535?搬上Dalvik工程师在SF上的回答,因为在Dalvik指令集里,调用方法的invoke-kind指令中,method reference index只 ...

  5. Android 方法数超过64k、编译OOM、编译过慢解决方案。

    目前将项目中的leancloud的即时通讯改为环信的即时通讯.当引入easeui的时候 出现方法数超过上限的问题. 搜索一下问题,解决方法很简单. 这里简单记录一下,顺序记录一下此解决方案导致的另一个 ...

  6. 彻底解决Android 应用方法数不能超过65K的问题

    作为一名Android开发者,相信你对Android方法数不能超过65K的限制应该有所耳闻,随着应用程序功能不断的丰富,总有一天你会遇到一个异常: Conversion to Dalvik forma ...

  7. (转载)Android 方法数超过64k、编译OOM、编译过慢解决方案。

    Android 方法数超过64k.编译OOM.编译过慢解决方案.   目前将项目中的leancloud的即时通讯改为环信的即时通讯.当引入easeui的时候 出现方法数超过上限的问题. 搜索一下问题, ...

  8. Android方法引用数超过65535优雅解决

    随着应用不断迭代更新,业务线的扩展,应用越来越大(比如:集成了各种第三方SDK或者公共开源的Library文件.jar文件)这样一来,项目耦合性就很高,重复作用的类就越来越多了,SO:问题就来了.相信 ...

  9. Android 65536方法数限制的思考

    前言 没想到,65536真的很小. 1 Unable to execute dex: method ID not in [0, 0xffff]: 65536 PS:本文只是纯探索一下这个65K的来源, ...

随机推荐

  1. git hg提交拉取

    工作总结web_acl 535 git clone “ssh://git@outergit.yonyou.com:49622/esn_web/web_acl.git" 600 git bra ...

  2. 新手C#SQLServer在程序里实现语句的学习2018.08.12

    从C#中连接到SQL Server数据库,再通过C#编程实现SQL数据库的增删改查. ado.net提供了丰富的数据库操作,这些操作可以分为三个步骤: 第一,使用SqlConnection对象连接数据 ...

  3. 安装RabbitMq-----windows

    在官网download我们所需要的版本,安装rabbitMq需要erlang支持 rabbitMq :http://www.rabbitmq.com/download.html erlang  :ht ...

  4. 双活部署前收集EMC存储设备信息

    以0.68服务器为例 1.拷贝emcgrab_Linux_v4.7.10.tar到linux服务器并将其解压到/tmp目录下 tar -xvf emcgrab_Linux_v4.7.10.tar -C ...

  5. 关于easyui-datagrid数据表格, 分页取出数据

    在制作数据表格的时候有一个这样的属性, pagination是否显示分页列表, 分页显示的时候需要分别从数据库中取数据, 每页显示几行, 即只从数据库取出几行数据来显示, 具体代码如下 1, 显示页面 ...

  6. ubuntu下 openvpn客户端的配置

    1.安装openvpn sudo apt-get install openvpn 2.配置openvpn 作为客户端,OpenVPN并没有特定的配置文件,而是由服务器提供方给出一个配置文件.对于认证, ...

  7. eclipse搭建struts2环境及所遇到的问题

    最近几天一直在搭建struts2框架,本身struts2框架的搭建是非常简单的,但不知道为什么最近就是总是报错,报了一大串的错 首先就是每次在类的根路径下创建struts.xml时,就报错,也不知道为 ...

  8. Time range (447392) for take 'Take 001' is larger than maximum allowed(100000).

    http://www.cnblogs.com/lopezycj/archive/2012/05/16/unity3d_tuchao.html https://forum.unity3d.com/thr ...

  9. 关于"undefined reference"错误

    这个错误换句话说: 链接的时候找不到实现的文件(谨记从这个入手!). 可能导致的原因有: 1. 没有链接库文件,包括静态库或动态库. 2. 链接文件的顺序问题,先后依赖问题,把被依赖的放后面. 3. ...

  10. PAT L2-010 排座位(floyd)

    布置宴席最微妙的事情,就是给前来参宴的各位宾客安排座位.无论如何,总不能把两个死对头排到同一张宴会桌旁!这个艰巨任务现在就交给你,对任何一对客人,请编写程序告诉主人他们是否能被安排同席. 输入格式: ...