-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moving NotificationDismissBroadcastReceiver logic to NotificationsProcessingService #7779
Conversation
…icationsProcessingService
Thank you @mzorz for great analysation and summary of all the information we have. Couple of notes/questions. AFAIUnderstand current solution moves the logic from BroadcastReceiver (UI thread) to a service (working thread). If that's the case, my comment from the previous PR is still valid, isn't it?
It might fix the ANR, but I'm not sure that I see how it would help avoid the conditions on which the deadlock is likely to happen.
Is it really at the same time. Since both these methods are executed on one thread (UI), how can they be executed at the same time? I think they are both added into the queue of the UI Thread Handler class and they are executed sequentially. Which one goes first is not defined (which might also be the reason why
I believe it's not shown in the thread dump, because it's not on the stack yet. It's waiting in the queue of the UI Thread Handler, but it never gets it's chance, because the NotificationsDetailActivity is blocked on the lock and it results in ANR. After reading through all the information, I'm starting to think it's not actually a deadlock situation - at least not in the real sense. We have just one resource Can't the the real issue be hidden in modifying collection within a for/foreach loops. |
As discussed elsewhere, putting this one on hold in favor of testing #7783 first. Marked this one with |
With the This following ANR shows there's a network operation going on to retrieve the notification's icon, which takes just too long and the other waiting operation (triggered by
I think we should be right to try merging this PR (which moves handling of the Broadcast Receiver to the ProcessingService) - and try adding some end to end tracking that we can check later. If we find as many non-closing (only starting) track hits instead of closed loops, then it would make sense to investigate further. Added the tracking here c782ac6402 Ready for another round @malinajirka |
c782ac6
to
07a96ad
Compare
Thanks @mzorz !
I'm sorry to hear that, however it did worth a try;). Thanks!
Sounds like a good plan! 👍 LGTM |
comes from #7773 (comment)
Thank you @malinajirka for your comments there, that made me look closer! 🙇
I think you nailed it, good question @malinajirka! And I realize my description of the problem in that PR didn't do justice to the exact specific problem. TBH I had the same doubts as well. I think the way I approached it is thinking we may fix the ANR even though we may not fix the root cause of the problem, as this fix would help avoid the conditions on which the deadlock is likely to happen, which in the end is already a probably good enough improvement.
Let me explain a bit more how I think things go then - looking into it a bit closer, the dump shows this:
Note that
thread 23
does not appear in the dump (been going through a lot of other reports and none of them seem to have the dump for the blocking thread).Basically that broadcast receiver is set to be called as the
delete
intent for the notification, i.e. when the user either dismisses it (slide off the screen) or taps on it, the notification is dismissed and this intent gets triggered, and received by theNotificationDismissBroadcastReceiver
. What it does internally is it callsremoveNotification()
, which is locked on the class object itself (GCMMessageService.class) as you've seen.At the same time the Activity gets launched and the
onCreate()
method is executed. This method also calls asynchronized
method within GCMMessageService.Here comes the important part: both the Activity lifecycle and the BroadcastReceiver are executed on the main thread (see https://developer.android.com/reference/android/content/BroadcastReceiver#onreceive). I'm tempted to say that's where the deadlock happens, but I'm with you in that there's no 2 threads in this situation, it's the same thread (main thread) locking on the same lock object.
After looking into the reports a bit more, I found something interesting.
There are other similar instances happening, all seem to be due to the synchronized methods in GCMMessageService class
1 - this one happens in
WPMainActivity.onResume
as it callsGCMMessageService.removeAllNotifications
2 - WPMainActivity.onCreate calls
WPMainActivity.launchWithNoteId
which holds onGCMMessageService.getNotificationsCount()
In #1, the user taps on a notification and thus that's why WPMainActivity.onResume is executed (main pendingIntent for the notification is to wake it up and launch the NotificationsDetailActivity), but then, also the broadcast receiver should be triggered (and is not shown in the dump). Assuming only the second one to arrive gets dumped, thenit would make sense that this is happening sometimes for one (broadcastreceiver) and sometimes for other entry point in the app where the broadcastreceiver "locked" the object first.
There are many more, and I find a pattern that the blocking thread is not in the dump (the blocked thread only is), but when the blocked thread is
NotificationDismissBroadcastReceiver.onReceive
then the blocking must be elsewhere, and when the blocked thread is something else, theNotificationDismissBroadcastReceiver.onReceive
doesn't appear. This also seems to make sense according to the amount of reports, as we have approximately half of reports onNotificationDismissBroadcastReceiver.onReceive
being blocked, and the rest of the reports are other threads supposedly being blocked by the other threads. Given odds of execution should be even, I tend to think that this is, in fact, a deadlock problem.So, knowledge adds up and gives some meaning to all this information:
1 - tapping on any notification will produce both the actual intent (bringing the app to the foreground, then launching NotificationDetailActivity to show the Note detail), and trigger the
NotificationDismissBroadcastReceiver
receiver.2 - the amount of reports and variety tend to show it's always something happening between the
NotificationDismissBroadcastReceiver
and catching the other thread wanting to access aGCMMessageService
synchronized method as well, on any of various entry points which match the lifecycle of the notification-tapped intent.3 - given this, I think we might have a better chance solving the issue by encapsulating the BroadcastReceiver execution in a different thread, rather than doing it separately on each call from WPMainActivity or NotificationsDetailActivity (also this should be more future-proof).
Therefore, this PR moves the logic to the already existing
NotificationsProcessingService
, adding a specific action to handle notification dismissal processing, and gets rid of the broadcast receiver.I think it works fine, and hope will resolve the issue!