如何分析各种ANR第二篇?Google官方文档详细教你

张开发
2026/4/14 21:50:50 15 分钟阅读

分享文章

如何分析各种ANR第二篇?Google官方文档详细教你
文章目录背景Execute service timeout(Service超时ANR)Common causes常见案例How to debug 如何调试ANRContent provider not respondingContentProvider无响应ANRCommon causesHow to debugSlow job responseJobService相关anrMystery ANRs神秘ANRMessage queue idle or nativePollOnceNo stack framesKnown issues背景在实际开发工作各种类型的ANR层出不穷之前一直也想google官方开发人员是否有解决各种ANR问题的一些文档等主要是想看看自己平时自己分析套路是否也和google的人一样还有就是想看看goole是否有啥分析ANR的新套路方法等今天刚好找到了google关于各种ANR问题的详细指导文档本来想用翻译成中文的版本但是感觉自动翻译的其实并不是太好所以直接上英文马哥搬运google的干货。本文主要摘抄以下几个类型的ANR详细分析指导和案例1、Execute service timeout2、Content provider not responding3、Slow job response4、Mystery ANRs下面是google官方文档底部也有原文链接。Execute service timeout(Service超时ANR)An execute service ANR happens when the app’s main thread doesn’t start a service in time. Specifically, a service doesn’t finish executing onCreate() and onStartCommand() or onBind() within the timeout period.Default timeout period:20 seconds for foreground service; 200 seconds for background service. The ANR timeout period includes the app cold start, if necessary, and calls to onCreate(), onBind(), or onStartCommand().To avoid execute service ANRs, follow these general best practices:Make sure that app startup is fast, since it’s counted in the ANR timeout if the app is started to run the service component.Make sure that the service’s onCreate(), onStartCommand(), and onBind() methods are fast.Avoid running any slow or blocking operations on the main thread from other components; these operations can prevent a service from starting quickly.Common causes常见案例The following table lists common causes of execute service ANRs and suggested fixes.How to debug 如何调试ANRFrom the cluster signature and ANR report in Google Play Console or Firebase Crashlytics, you can often determine the cause of the ANR based on what the main thread is doing.Note: Ignore execute service ANR clusters that say “nativePollOnce” or “main thread idle.” These usually correspond to ANRs where the stack dump is taken too late, and are generally not actionable. The actual ANR issues are usually present in other clusters, so real issues aren’t being hidden. See nativePollOnce for more details.The following flow chart describes how to debug an execute service ANR.Figure 6. How to debug an execute service ANR.If you’ve determined that the execute service ANR is actionable, follow these steps to help resolve the issue:Find the service component class in the ANR signature. In Google Play Console, the service component class is shown in the ANR signature. In the following example ANR details, it’s com.example.app/MyService.com.google.common.util.concurrent.Uninterruptibles.awaitUninterruptibly Executingservicecom.example.app/com.example.app.MyServiceDetermine whether the slow or block operation is part of app startup, the service component, or elsewhere by checking for the following important function call(s) in the main threads.For example, if the onStartCommand() method in the MyService class is slow, the main threads will look like this:at com.example.app.MyService.onStartCommand(FooService.java:25)at android.app.ActivityThread.handleServiceArgs(ActivityThread.java:4820)at android.app.ActivityThread.-$$Nest$mhandleServiceArgs(unavailable:0)at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2289)at android.os.Handler.dispatchMessage(Handler.java:106)at android.os.Looper.loopOnce(Looper.java:205)at android.os.Looper.loop(Looper.java:294)at android.app.ActivityThread.main(ActivityThread.java:8176)at java.lang.reflect.Method.invoke(Native method:0)If you can’t see any of the important function calls, there are a couple other possibilities:The service is running or shutting down, which means that the stacks are taken too late. In this case, you can ignore the ANR as a false positive.A different app component is running, such as a broadcast receiver. In this case the main thread is likely blocked in this component, preventing the service from starting.If you do see a key function call and can determine where the ANR is happening generally, check the rest of the main thread stacks to find the slow operation and optimize it or move it off the critical path.For more information about services, see the following pages:Services overviewForeground servicesServiceContent provider not respondingContentProvider无响应ANRA content provider ANR happens when a remote content provider takes longer than the timeout period to respond to a query, and is killed.Default timeout period: specified by content provider using ContentProviderClient.setDetectNotResponding. The ANR timeout period includes the total time for a remote content provider query to run, which includes cold-starting the remote app if it wasn’t already running.To avoid content provider ANRs, follow these best practices:Make sure that app startup is fast, since it’s counted in the ANR timeout if the app is started to run the content provider.Make sure that the content provider queries are fast.Don’t perform lots of concurrent blocking binder calls that can block all the app’s binder threads.Common causesThe following table lists common causes of content provider ANRs and suggested fixes.How to debugTo debug a content provider ANR using the cluster signature and ANR report in Google Play Console or Firebase Crashlytics, look at what the main thread and binder thread(s) are doing.The following flow chart describes how to debug a content provider ANR:Figure 7. How to debug a content provider ANR.The following code snippet shows what the binder thread looks like when it’s blocked due to a slow content provider query. In this case, the content provider query is waiting for lock when opening a database.binder:11300_2(tid13)Blocked Waitingforosm(0x01ab5df9)held by at com.google.common.base.Suppliers$NonSerializableMemoizingSupplier.get(Suppliers:182)at com.example.app.MyClass.blockingGetOpenDatabase(FooClass:171)[...]at com.example.app.MyContentProvider.query(MyContentProvider.java:915)at android.content.ContentProvider$Transport.query(ContentProvider.java:292)at android.content.ContentProviderNative.onTransact(ContentProviderNative.java:107)at android.os.Binder.execTransactInternal(Binder.java:1339)at android.os.Binder.execTransact(Binder.java:1275)The following code snippet shows what the main thread looks like when it’s blocked due to slow app startup. In this case, the app startup is slow due to lock contention during dagger initialization.main(tid1)Blocked[...]at dagger.internal.DoubleCheck.get(DoubleCheck:51)- locked 0x0e33cd2c(a qsn)at dagger.internal.SetFactory.get(SetFactory:126)at com.myapp.Bar_Factory.get(Bar_Factory:38)[...]at com.example.app.MyApplication.onCreate(DocsApplication:203)at android.app.Instrumentation.callApplicationOnCreate(Instrumentation.java:1316)at android.app.ActivityThread.handleBindApplication(ActivityThread.java:6991)at android.app.ActivityThread.-$$Nest$mhandleBindApplication(unavailable:0)at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2235)at android.os.Handler.dispatchMessage(Handler.java:106)at android.os.Looper.loopOnce(Looper.java:205)at android.os.Looper.loop(Looper.java:294)at android.app.ActivityThread.main(ActivityThread.java:8170)at java.lang.reflect.Method.invoke(Native method:0)at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:552)at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:971)Slow job responseJobService相关anrA slow job response ANR happens when the app takes too long to respond to JobService.onStartJob() or JobService.onStopJob(), or takes too long to provide a notification using JobService.setNotification(). This suggests that the app’s main thread is blocked doing something else.If it’s an issue with JobService.onStartJob() or JobService.onStopJob(), check what’s happening on the main thread. If it’s an issue with JobService.setNotification(), make sure to call it as quickly as possible. Don’t do a lot of work before providing the notification.Mystery ANRs神秘ANRSometimes it’s unclear why an ANR is occurring, or there is insufficient information to debug it in the cluster signature and ANR report. In these cases, there are still some steps you can take to determine whether the ANR is actionable.Message queue idle or nativePollOnceIf you see the frame android.os.MessageQueue.nativePollOnce in the stacks, it often indicates that the suspected unresponsive thread was actually idle and waiting for looper messages. In Google Play Console, the ANR details look like this:Native method - android.os.MessageQueue.nativePollOnceExecutingservicecom.example.app/com.example.app.MyService For example,ifthe main thread is idle the stackslooklike this:maintid1NativeMain threadIdle#00 pc 0x00000000000d8b38 /apex/com.android.runtime/lib64/bionic/libc.so (__epoll_pwait8)#01 pc 0x0000000000019d88 /system/lib64/libutils.so (android::Looper::pollInner(int)184)#02 pc 0x0000000000019c68 /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)112)#03 pc 0x000000000011409c /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)44)at android.os.MessageQueue.nativePollOnce(Native method)at android.os.MessageQueue.next(MessageQueue.java:339)at android.os.Looper.loop(Looper.java:208)at android.app.ActivityThread.main(ActivityThread.java:8192)at java.lang.reflect.Method.invoke(Native method)at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:626)at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1015)There are several reasons why the suspected unresponsive thread can be idle:Late stack dump.The thread recovered during the short period between the ANR triggering and the stacks being dumped. The latency in Pixels on Android 13 is around 100ms, but can exceed 1s. The latency in Pixels on Android 14 is usually under 10ms.Thread misattribution.The thread used to build the ANR signature was not the actual unresponsive thread that caused the ANR. In this case, try to determine if the ANR is one of the following types:Broadcast receiver timeoutContent provider not respondingNo focused windowSystem-wide issue.The process wasn’t scheduled due to heavy system load or an issue in the system server.No stack framesSome ANR reports don’t include the stacks with the ANR, which means that the stack dumping failed when generating the ANR report. There are a couple of possible reasons for missing stack frames:Taking the stack takes too long and times out.The process died or was killed before the stacks were taken.[...]---CriticalEventLog---capacity:20timestamp_ms:1666030897753window_ms:300000libdebuggerd_client:failed to read status response from tombstoned:timeout reached?-----Waiting Channels:pid7068at2022-10-1802:21:37.US_SOCIAL_SECURITY_NUMBER0800-----[...]ANRs without stack frames aren’t actionable from the cluster signature or ANR report. To debug, look at other clusters for the app, since if an issue is large enough it’ll usually have its own cluster where stack frames are present. Another option is to look at Perfetto traces.Known issuesKeeping a timer in your app’s process for the purposes of finishing broadcast handling before an ANR triggers might not work correctly because of the asynchronous way the system monitors ANRs.原文地址https://mp.weixin.qq.com/s/Bl2gV2ERghm2JZFF5LRfIA如何分析各种ANRGoogle官方文档详细教你

更多文章