然而,很多技术人员总喜欢尝试用技术解释一切,误认为技术能够解决所有问题。我上面举的例子可能更偏工程技术一点,但即使是科学技术也只能解决特定领域的问题,甚至科学本身也是有范畴的。打个比方,你能用技术解决婆媳矛盾吗?你能用技术解决巴以冲突?你能用技术让 Google 进入中国?很显然不能,甚至你邻居家小孩天天在门口拉屎都用技术解决不了。这种“拿着锤子看什么都像是钉子”的问题,并非只有技术人员才有,但越优秀的技术人员,往往越容易陷入这个怪圈。
eBPF is a revolutionary technology with origins in the Linux kernel that can run sandboxed programs in a privileged context such as the operating system kernel. It is used to safely and efficiently extend the capabilities of the kernel without requiring to change kernel source code or load kernel modules.
简单来说,eBPF 是一个运行在 Linux 内核里面的虚拟机组件,它可以在无需改变内核代码或者加载内核模块的情况下,安全而又高效地拓展内核的功能。
In Unix-like computer OSes (such as Linux), root is the conventional name of the user who has all rights or permissions (to all files and programs) in all modes (single- or multi-user).
如果小伙伴们有印象的话,Android 上存在一个广为流传的灰色保活方法:创建两个 Service 来启动通知,最后可以创建一个没有通知栏的前台服务,从而提升进程的优先级。接下来要介绍的这个漏洞与此类似,实际上还有一个 CVE-2020-0313也是前台服务相关。。这块代码实在是写的稀烂,漏洞百出。好了回到正题,我们先介绍一下前台服务:
看到这里其实就知道,我们传递了一个不存在的 channel,系统getNotificationChannel会发现不对劲,然后直接抛出一个异常invalid channel for service notification,捕获了异常之后,系统会调用 ams.crashApplication,我们看一下这个 ams.crashApplicaiton,一路跟踪,我们会发现代码调用到了这里:
voidscheduleCrash(String message){ // Checking killedbyAm should keep it from showing the crash dialog if the process // was already dead for a good / normal reason. if (!killedByAm) { if (thread != null) { if (pid == Process.myPid()) { Slog.w(TAG, "scheduleCrash: trying to crash system process!"); return; } long ident = Binder.clearCallingIdentity(); try { thread.scheduleCrash(message); } catch (RemoteException e) { // If it's already dead our work is done. If it's wedged just kill it. // We won't get the crash dialog or the error reporting. kill("scheduleCrash for '" + message + "' failed", true); } finally { Binder.restoreCallingIdentity(ident); } } } }
voidscheduleAppCrashLocked(int uid, int initialPid, String packageName, int userId, - String message){ + String message, boolean force) { ProcessRecord proc = null; // Figure out which process to kill. We don't trust that initialPid @@ -374,6 +378,14 @@ } proc.scheduleCrash(message); + if (force) { + // If the app is responsive, the scheduled crash will happen as expected + // and then the delayed summary kill will be a no-op. + final ProcessRecord p = proc; + mService.mHandler.postDelayed( + () -> killAppImmediateLocked(p, "forced", "killed for invalid state"), + 5000L); + } }
一开始我以为是我自己程序写的有问题,毕竟这个驱动是使用纯 C 语言实现的,并且用到了 epoll 的 ET 模式,这种非阻塞的编程模型的确有许多微妙的地方,一不小心就容易出错。我排查了很久都没有发现问题所在,更有趣的是,虽然看起来我的程序无法回收资源,但是在压力测试下他也能正常工作,完全没有资源泄漏的迹象;实在没办法,我就祭出了大杀器 strace。不看不知道,一看就好笑:strace 显示,我的程序逻辑是正常的,它正确地调用了相关的资源释放函数!但是,logcat 中没有相关的日志,在客户端退出之后 server 端的日志就戛然而止了。看起来,好像不是我程序的问题,而是系统的 logcat 丢失了日志?
去年发布的 Android P上引入了针对非公开API的限制,对开发者来说,这绝对是有史以来最重大的变化之一。前天 Google 发布了 Android Q 的 Beta 版,越来越多的 API 被加入了黑名单,而且 Google 要求下半年 APP 必须 target 28,这意味着现在的深灰名单也会生效;可以预见,在不久的将来,我们要跟大量的 API 说再见了。
上次分析系统是如何施加这个限制 的时候,我们提到了几种方式,最终给出了一种修改 runtime flag 的办法;其中我们提到,系统有一个 fn_caller_is_trusted 条件:如果调用者是系统类,那么就允许被调用。这是显而易见的,毕竟这些私有 API 就是给系统用的,如果系统自己都被拒绝了,这是在玩锤子呢?
到这里,我们已经能通过「元反射」的方式去任意获取隐藏方法或者隐藏 Field 了。但是,如果我们所有使用的隐藏方法都要这么干,那还有点小麻烦。在 上文中,我们后来发现,隐藏 API 调用还有「豁免」条件,具体代码如下:
1 2 3 4 5 6 7 8 9 10 11
if (shouldWarn || action == kDeny) { if (member_signature.IsExempted(runtime->GetHiddenApiExemptions())) { action = kAllow; // Avoid re-examining the exemption list next time. // Note this results in no warning for the member, which seems like what one would expect. // Exemptions effectively adds new members to the whitelist. MaybeWhitelistMember(runtime, member); return kAllow; } // 略 }
template<typename T> inline Action GetMemberAction(T* member, Thread* self, std::function<bool(Thread*)> fn_caller_is_trusted, AccessMethod access_method) REQUIRES_SHARED(Locks::mutator_lock_){ DCHECK(member != nullptr); // Decode hidden API access flags. // NB Multiple threads might try to access (and overwrite) these simultaneously, // causing a race. We only do that if access has not been denied, so the race // cannot change Java semantics. We should, however, decode the access flags // once and use it throughout this function, otherwise we may get inconsistent // results, e.g. print whitelist warnings (b/78327881). HiddenApiAccessFlags::ApiList api_list = member->GetHiddenApiAccessFlags(); Action action = GetActionFromAccessFlags(member->GetHiddenApiAccessFlags()); if (action == kAllow) { // Nothing to do. return action; } // Member is hidden. Invoke `fn_caller_in_platform` and find the origin of the access. // This can be *very* expensive. Save it for last. if (fn_caller_is_trusted(self)) { // Caller is trusted. Exit. return kAllow; } // Member is hidden and caller is not in the platform. return detail::GetMemberActionImpl(member, api_list, action, access_method); }
struct Runtime { // 64 bit so that we can share the same asm offsets for both 32 and 64 bits. uint64_t callee_save_methods_[kCalleeSaveSize]; // Pre-allocated exceptions (see Runtime::Init). GcRoot<mirror::Throwable> pre_allocated_OutOfMemoryError_when_throwing_exception_; GcRoot<mirror::Throwable> pre_allocated_OutOfMemoryError_when_throwing_oome_; GcRoot<mirror::Throwable> pre_allocated_OutOfMemoryError_when_handling_stack_overflow_; GcRoot<mirror::Throwable> pre_allocated_NoClassDefFoundError_;
// ... (省略大量成员)
std::unique_ptr<JavaVMExt> java_vm_;
// ... (省略大量成员)
// Specifies target SDK version to allow workarounds for certain API levels. int32_t target_sdk_version_;
// ... (省略大量成员)
bool is_low_memory_mode_; // Whether or not we use MADV_RANDOM on files that are thought to have random access patterns. // This is beneficial for low RAM devices since it reduces page cache thrashing. bool madvise_random_access_; // Whether the application should run in safe mode, that is, interpreter only. bool safe_mode_;
if (shouldWarn || action == kDeny) { if (member_signature.IsExempted(runtime->GetHiddenApiExemptions())) { action = kAllow; // Avoid re-examining the exemption list next time. // Note this results in no warning for the member, which seems like what one would expect. // Exemptions effectively adds new members to the whitelist. MaybeWhitelistMember(runtime, member); return kAllow; } // 略 }
两年前阿里开源了Dexposed 项目,它能够在Dalvik上无侵入地实现运行时方法拦截,正如其介绍「enable ‘god’ mode for single android application」所言,能在非root情况下掌控自己进程空间内的任意Java方法调用,给我们带来了很大的想象空间。比如能实现运行时AOP,在线热修复,做性能分析工具(拦截线程、IO等资源的创建和销毁)等等。然而,随着ART取代Dalvik成为Android的运行时,一切都似乎戛然而止。
Android N(7.0/7.1) N 开始采用了混合编译的方式,既有AOT也有JIT,还伴随着解释执行;混合模式对Hook影响是巨大的,以至于Xposed直到今年才正式支持Android N。首先JIT的出现导致方法入口不固定,跑着跑着入口就变了,更麻烦的是还会有OSR(栈上替换),不仅入口变了,正在运行时方法的汇编代码都可能发生变化;其次,JIT的引入带来了更深度的运行时方法内联,这些都使得虚拟机层面的Hook更为复杂。
从上面的分析可以看出,就算不查找ArtMethod,这个ArtMethod的enntrypoint所指向代码是一定要用到的(废话,不然CPU执行什么,解释执行在暂不讨论)。既然替换入口的方式无法达到Hook所有类型方法的目的,那么如果不替换入口,而是直接修改入口里面指向的代码呢?(这种方式有个高大上的学名:callee side dynamic rewriting)
mirror::Object* receiver = nullptr; if (!m->IsStatic()) { // Check that the receiver is non-null and an instance of the field's declaring class. receiver = soa.Decode<mirror::Object*>(javaReceiver); if (!VerifyObjectIsClass(receiver, declaring_class)) { returnNULL; }
// Find the actual implementation of the virtual method. m = receiver->GetClass()->FindVirtualMethodForVirtualOrInterface(m); }
// 略.. InvokeWithArgArray(soa, m, &arg_array, &result, shorty); // 略 。。 // Box if necessary and return. return soa.AddLocalReference<jobject>(BoxPrimitive(mh.GetReturnType()->GetPrimitiveType(), result)); }
mirror::Object* receiver = nullptr; if (!m->IsStatic()) { // Check that the receiver is non-null and an instance of the field's declaring class. receiver = soa.Decode<mirror::Object*>(javaReceiver); if (!VerifyObjectIsClass(receiver, declaring_class)) { returnNULL; }
// Find the actual implementation of the virtual method. m = receiver->GetClass()->FindVirtualMethodForVirtualOrInterface(m); }
如果你本地编译了Android源码,那么就不需要这一步了;但是更多的时候我们只是想调试某一个模块,那么只需要下载这个模块的源码就够了。我这里演示的是调试 ART 运行时,因此直接下载ART模块的源码即可,我编译的Android源码版本是 android-5.1.1_r9,因此需要check这个分支的源码,地址在这里:ART-android-5.1.1_r9
(lldb) add-dsym /Users/weishu/dev/github/Android-native-debug/app/symbols/libart.so symbol file '/Users/weishu/dev/github/Android-native-debug/app/symbols/libart.so' \ has been added to '/Users/weishu/.lldb/module_cache/remote-android/.cache/C51E51E5-0000-0000-0000-000000000000/libart.so'
注意后面那个目录你的机器上与我的可能不同,需要修改一下。我们再看看有什么变化,看一下刚刚的断点:
(lldb) br list 2 2: name = ‘CollectGarbageInternal’, locations = 1, resolved = 1, hit count = 0 2.1: where = libart.so`art::gc::Heap::CollectGarbageInternal(art::gc::collector::GcType, art::gc::GcCause, bool) at heap.cc:2124, address = 0xb4648c20, resolved, hit count = 0