High CPU utilization while sending or receiving over slow network. (WFM in version 17? Bad again in TB24?) - may involve oscillating <progress> element in status bar
Categories
(MailNews Core :: Networking, defect)
Tracking
(Not tracked)
People
(Reporter: scovich, Unassigned)
References
(Depends on 2 open bugs, Blocks 1 open bug)
Details
(Keywords: perf, Whiteboard: [needs profile][needs protocol log][workaround: comment 55])
Checking email over a slow/overloaded wireless connection can take 90 seconds or longer, with CPU usage hovering at 40-80%. This destroys my laptop's battery life, especially if a connection times out; sometimes I have to close TB to make it stop burning CPU. The same can happen if an email server is down or not responding properly.
Reporter | ||
Comment 1•13 years ago
|
||
I just noticed: while submitting this bug over the same slow/overloaded wireless connection mentioned above, Firefox displayed the same behavior (40-80% cpu util while waiting for a response from bugzilla). Perhaps this bug is common to both?
Comment 2•13 years ago
|
||
(In reply to Ryan Johnson from comment #0) > Checking email over a slow/overloaded wireless connection can take 90 > seconds or longer, with CPU usage hovering at 40-80%. SSL(pop3/imap/smtp with SSL) only problem? No problem if non-SSL? (In reply to Ryan Johnson from comment #1) > I just noticed: while submitting this bug over the same slow/overloaded > wireless connection mentioned above, Firefox displayed the same behavior > (40-80% cpu util while waiting for a response from bugzilla). Submitting of bug to bugzilla.mozilla.org is upload of data by POST via HTTPS:(HTTP with SSL, sender of TCP=Fx, receiver of TCP=server when POST process). No problem if non-SSL site(HTTP:)? It may be frequent packet loss followed by re-transmission. When(with which build of Tb/Fx, specific date, after upgrade, ...) did your problem start to occur? Can small network.tcp.sendbuffer(default=131072) such as 4096, 8192 be a workaround for bug submitting or comment posting at bugzilla.mozilla.org?
Reporter | ||
Comment 4•13 years ago
|
||
(In reply to WADA from comment #2) > (In reply to Ryan Johnson from comment #0) > > Checking email over a slow/overloaded wireless connection can take 90 > > seconds or longer, with CPU usage hovering at 40-80%. > > SSL(pop3/imap/smtp with SSL) only problem? No problem if non-SSL? All four of my accounts are SSL, but I can't disable it verify that non-SSL works better. > (In reply to Ryan Johnson from comment #1) > > I just noticed: while submitting this bug over the same slow/overloaded > > wireless connection mentioned above, Firefox displayed the same behavior > > (40-80% cpu util while waiting for a response from bugzilla). > > Submitting of bug to bugzilla.mozilla.org is upload of data by POST via > HTTPS:(HTTP with SSL, sender of TCP=Fx, receiver of TCP=server when POST > process). > No problem if non-SSL site(HTTP:)? I'll have to get back to you if I see a non-SSL upload taking a long time -- it doesn't happen to me that much. > It may be frequent packet loss followed by re-transmission. > When(with which build of Tb/Fx, specific date, after upgrade, ...) did your > problem start to occur? I noticed just during the last month or so when my normally cool laptop started cooking my legs. Whether the problem existed before in less-blistering form I couldn't say. > Can small network.tcp.sendbuffer(default=131072) such as 4096, 8192 be a > workaround for bug submitting or comment posting at bugzilla.mozilla.org? I changed it and will let you know after posting this. Do I need to restart FF first?
Reporter | ||
Comment 5•13 years ago
|
||
Update: I still saw 40% cpu util during the previous upload. Trying again after restarting FF.
Reporter | ||
Comment 6•13 years ago
|
||
Update: It looks like the setting didn't stick until I restarted. With buffer size 4096 CPU util during the comment upload was no more than 20%, a significant improvement. Is there some similar setting in TB that would have an equivalent effect?
Comment 7•13 years ago
|
||
(In reply to Ryan Johnson from comment #6) > Is there some similar setting in TB that would have an equivalent effect? Tb also uses same setting. Go Tools/Options/Advanced/General, Config Editor. > With buffer size 4096 CPU util during the comment upload was no more than 20%, How about network.tcp.sendbuffer=65536? (See Bug 541367 and bugs listed in Dependency tree for that bug, with "Show Resolved", please) As network.tcp.sendbuffer=4096 is too small usually(causes inefficient network resource use), data transmission will take longer than usual. Please find most appropriate value in your environment, please.
Reporter | ||
Comment 8•13 years ago
|
||
(In reply to WADA from comment #7) > (In reply to Ryan Johnson from comment #6) > > Is there some similar setting in TB that would have an equivalent effect? > > Tb also uses same setting. Go Tools/Options/Advanced/General, Config Editor. Yes, but it's downloading, not uploading... > > > With buffer size 4096 CPU util during the comment upload was no more than 20%, > > How about network.tcp.sendbuffer=65536? > (See Bug 541367 and bugs listed in Dependency tree for that bug, with "Show > Resolved", please) > As network.tcp.sendbuffer=4096 is too small usually(causes inefficient > network resource use), data transmission will take longer than usual. Please > find most appropriate value in your environment, please. It's really starting to sound like this is just a hack to work around the real (still unknown?) problem, which doesn't make me very enthusiastic about going further in this direction. Perhaps you could explain why tcp buffer size should impact CPU utilization so strongly?
Reporter | ||
Comment 9•13 years ago
|
||
For comparison, I just tried loading this web page from IE9, and it has exactly *zero* cpu util during the wait, then spikes just long enough to page render the page.
Comment 10•13 years ago
|
||
(In reply to Ryan Johnson from comment #8) > (In reply to WADA from comment #7) > > As network.tcp.sendbuffer=4096 is too small usually(causes inefficient > > network resource use), data transmission will take longer than usual. Please > > find most appropriate value in your environment, please. > It's really starting to sound like this is just a hack to work around the > real (still unknown?) problem, which doesn't make me very enthusiastic about > going further in this direction. Perhaps you could explain why tcp buffer > size should impact CPU utilization so strongly? because if the network is flakey (which includes faulty hardware) it will cause packets to be retransmitted, and thus drive up CPU what make and model wireless router are you running? Also, how are you determining that you have "slow/overloaded wireless connection"?
Comment 11•13 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #10) > because if the network is flaky (which includes faulty hardware) it will > cause packets to be retransmitted, and thus drive up CPU Bug 475603 - Lots of timeouts for DNS requests with Netgear Router WGR614 - is one example of hardware problem (I don't recall whether it drove up CPU)
Reporter | ||
Comment 12•13 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #10) > (In reply to Ryan Johnson from comment #8) > > (In reply to WADA from comment #7) > > > As network.tcp.sendbuffer=4096 is too small usually(causes inefficient > > > network resource use), data transmission will take longer than usual. Please > > > find most appropriate value in your environment, please. > > It's really starting to sound like this is just a hack to work around the > > real (still unknown?) problem, which doesn't make me very enthusiastic about > > going further in this direction. Perhaps you could explain why tcp buffer > > size should impact CPU utilization so strongly? > > because if the network is flakey (which includes faulty hardware) it will > cause packets to be retransmitted, and thus drive up CPU Somehow IE9 manages to avoid the problem (see comment 9) while using the exact same hardware, network connection, and web page (this one), which suggests that the problem lies closer to home. > > what make and model wireless router are you running? My home router is a Cisco (at work, can't remember the model), but the exact same thing happens at school and with at least two conference hotel wireless setups on different continents. Besides, the same thing occurs when there's no router at all (wireless switched off on the bus) and the connection is just plain timing out. > > Also, how are you determining that you have "slow/overloaded wireless > connection"? At the time of reporting, there were 300 people in the same conference session as me, all trying to read their email at the same time; one WIFI router sat on a stand in the corner. It took some tries to even get an IP address (192.168.0.0/24).
Comment 13•13 years ago
|
||
This also happens on slow ADSL networks. I have the same problem with Thunderbird 8.0 sending a 4-megabyte attachment oven an ADSL (order of 100 kb/s uplink). It takes 20 minutes to send, and during this 20 minutes, the computer seems hogged up. The computer should not be hogged while simply waiting for bytes to be sent over the network card. Thanks.
Comment 14•13 years ago
|
||
(In reply to Ryan Johnson from comment #8) > It's really starting to sound like this is just a hack to work around the > real (still unknown?) problem, which doesn't make me very enthusiastic about > going further in this direction. Perhaps you could explain why tcp buffer > size should impact CPU utilization so strongly? No. It's for problem determiation. - Even with network.tcp.sendbuffer=4096, CPU 100% still occurs, with SSL, SSL only problem => Bug 538283 - With network.tcp.sendbuffer<=64KB, CPU 100% problem or connection loss is resolved, SSL or non-SSL is irrelevant => Router's bug If SSL, even when cause of CPU 100% was router's bug and resolved by network.tcp.sendbuffer<=64KB, higher CPU consumption than expected may occur due to Bug 538283. CPU utilization may be higher when network.tcp.sendbuffer=64KB than CPU utilization with network.tcp.sendbuffer=4KB if slow network. If Wireless network, searching appropriate network.tcp.sendbuffer value is never workaround. If probability of packet loss is high, sendbuffer size is better reduced. It's a performance tuning. "Do such things or not" is all up to you.
Comment 15•13 years ago
|
||
Responding to comment 2: << When(with which build of Tb/Fx, specific date, after upgrade, ...) did your problem start to occur? >> using the set-up of comment 13: << I have the same problem with Thunderbird 8.0 sending a 4-megabyte attachment oven an ADSL (order of 100 kb/s uplink). >> It may have started far back in version 2.X or 3.X, even since I started to send large attachments, and found that my computer is hogged up, because I have not updated my Thunderbird (stayed at version 3.X) until recently. And Thunderbird and Firefox picked up a bad habit of using up version numbers quickly. Thanks.
Reporter | ||
Comment 16•13 years ago
|
||
(In reply to WADA from comment #14) > (In reply to Ryan Johnson from comment #8) > > It's really starting to sound like this is just a hack to work around the > > real (still unknown?) problem, which doesn't make me very enthusiastic about > > going further in this direction. Perhaps you could explain why tcp buffer > > size should impact CPU utilization so strongly? > > No. > It's for problem determiation. [snipped lots of text about knob-turning] I'm using a very simple problem determination process here: - Problem occurs on a wide variety of networks (home WLAN, .edu LAN, .edu WLAN, overloaded hotel WLAN ==> probably not a router config issue (they can't all be wrong) - Problem occurs when not connected to *any* network (if TB thinks there's connectivity) ==> probably not a buffer size issue (buffer should fill "immediately" and then stop since nothing is draining it) - Problem also occurs in FF (CPU usage while loading from a slow web site) ==> probably an issue with shared infrastructure (xul.dll?) - Problem does *not* occur in IE ==> not an OS config problem Maybe it is a "simple matter of tuning" but it's not a game users should have to play... for each network... for each changing situation. Especially not when other products seem able to handle the issue without user intervention. Emphatically not if it turns out the Windows profiler is right and UI redrawing overhead is [part of] the problem (see bug #686495 for details). > "Do such things or not" is all up to you. "Compete or do not" is all up to you. The bar is set, and exposing this sort of knob-foolery to users is below it.
Comment 17•12 years ago
|
||
reporter, do you still see this problem when using a current version?
Reporter | ||
Comment 18•12 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #17) > reporter, do you still see this problem when using a current version? Using 16.0.2, the problem persists. Disabling all virtual network adapters used by virtual machines on my computer helps a little (they fooled TB into thinking there was always connectivity), and doing so cuts CPU usage by almost half, but I still have to commute with TB closed to conserve battery. Steps tried: 1. Open TB 2. Connect to internet 3. Check email 4. Disable wireless 5. Check email again 6. CPU usage jumps (split between thunderbird.exe and dwm.exe) as multiple "unable to connect" windows slide across the bottom of the screen.
Reporter | ||
Comment 19•12 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #17) > reporter, do you still see this problem when using a current version? After updating to 17.0, the situation seems to be markedly improved. I'll try this out for a week or two to be sure, but this version may have fixed it for me.
Reporter | ||
Comment 20•12 years ago
|
||
(In reply to Ryan Johnson from comment #19) > (In reply to Wayne Mery (:wsmwk) from comment #17) > > reporter, do you still see this problem when using a current version? > > After updating to 17.0, the situation seems to be markedly improved. I'll > try this out for a week or two to be sure, but this version may have fixed > it for me. Still no CPU hogging troubles since the upgrade. Thanks for the fix!
Comment 21•12 years ago
|
||
Thanks for the update.
Comment 22•11 years ago
|
||
FWIW this would have partly helped by changes such that every gmail message is not downloaded at least twice. Do you find version 24 be the same or better?
Reporter | ||
Comment 23•11 years ago
|
||
I hadn't been paying attention lately, usually I just close TB if I'm going off grid (and FF as well, if I really want to maximize battery life). A cursory test says it's back to the original (bad) behavior. Downloading a 27MB email from the in-laws over a decent DSL connection keeps the CPU at 30-60% during the entire download. However, it does seem to do better at handling dropped connections: turning off the wifi while an email was downloading only spiked the CPU to 100% for about 10 seconds before it gave up (instead of 60-90 seconds like before). Note that I'm on a new laptop, using a new WIFI router, and all TB knobs are at defaults unless they were attached to the user profile I imported from the old machine. (I still don't understand why waiting on the network should use more than single-digit %CPU, *especially* if no bytes are coming down the pipe. I don't have the Windows profiler handy to see where those CPU cycles are going, though, and no time to install it right now).
Reporter | ||
Updated•11 years ago
|
Comment 24•11 years ago
|
||
I can confirm this bug for Thunderbird 24.3.0. When using Thunderbird with a slow or intermittent cnonection or behind a firewall that requires a (SOCKS) proxy to connect to the mail server and the proxy is not properly configured, there is a high CPU load until the connection times out. Obviously there is some busy loop waiting for the connection. Please remove the "TB5" part of the bug's subject.
Comment 25•10 years ago
|
||
According to comment 24
Updated•10 years ago
|
Comment 26•9 years ago
|
||
A profile will be helpful https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Thunderbird_Performance_Problem_with_G Unfortunately the symbols aren't working just now.
Comment 27•9 years ago
|
||
David, and/or Ryan? (In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #26) > A profile will be helpful > https://developer.mozilla.org/en-US/docs/Mozilla/Performance/ > Reporting_a_Thunderbird_Performance_Problem_with_G
Reporter | ||
Comment 28•9 years ago
|
||
My Thunderbird experience is currently dominated by Bug #1249945, so I'm not able to tell whether this bug is still a problem.
Comment 29•9 years ago
|
||
(In reply to Ryan Johnson from comment #28) > My Thunderbird experience is currently dominated by Bug #1249945, so I'm not > able to tell whether this bug is still a problem. Ryan, thanks for the update. Please add the profile URL in your Bug #1249945 so we can view it.
Comment 30•9 years ago
|
||
Thanks Wayne for the link how to produce a profile. I still had some trouble with Cleopatra, but eventually it worked out: http://people.mozilla.org/~bgirard/cleopatra/?1457530381985#report=ee827feb030825119dc2230f1109b1ad88bafd0b http://people.mozilla.org/~bgirard/cleopatra/?1457530740188#report=9e5a3847d8585a5a23c606c37ed24f43e2a36ac7 http://people.mozilla.org/~bgirard/cleopatra/?1457530381985#report=ee827feb030825119dc2230f1109b1ad88bafd0b For current Thunderbird 38.6.0 on my Windows 7 machine, the problem persists: When using Thunderbird with a slow or intermittent connection or the SOCKS proxy is not reachable or not properly configured, there is a high CPU load (some 30-40% on one of my CPU cores) until the connection times out. Obviously there is some busy loop waiting for the connection. As mentioned already for several related Mozilla issues, such problems should be easily reproducible: set a non-existing proxy name and port as the SOCKS server, say: 1.2.3.4 and port 5, and then try to get/receive emails, (e.g., by clicking on an IMAP folder). I wonder why this bug is still considered unconfirmed, since this bug is open since 2011.
Comment 31•9 years ago
|
||
Here's another profile, which may be related or not, witnessing needless high CPU load for current TB. http://people.mozilla.org/~bgirard/cleopatra/?1457531717548#report=12f8c4d34482d37a5184eace3ad40ded9135cf69
Comment 32•8 years ago
|
||
Ryan, what AV and firewall were you running with Windows 7? And now with windows 10? > I wonder why this bug is still considered unconfirmed, since this bug is open since 2011. Because of comment 10, and because we don't know the source of CPU usage and what's happening in networking. Plus the time difference between your comments in Ryan's initial bug report (plus conditions to reproduce) I'm not convinced your netowrk issue is the same as Ryan's. Only further analysis will tell. But you certainly should be good with bug 1107251 and bug 919485 That said, a) there is bug 76473 (filed roughly in same time frame) and b) further analysis is needed, and what you and Ryan see with version 50 (beta) and newer would be most helpful (but I don't expect bad proxy to be any better) Ryan's 3 performance bugs, with differing network conditions : * v6, bug 686495, win7, no/disconnected network - no profiler run, but xperf shows CPU in graphics code * bug 1249945, win10, good network, 800MB? - Ryan's profile run is 50% wait for NtWaitForMultipleObjects, ~50% CC (cycle collect), almost no painting CPU * v6, this bug, win7, slow/bad network - no profile run from Ryan. David's v24 "proxy" profiles [1] are similar to bug 1249956 only in the high Nt waiting - high painting CPU, ~40% wait for NtWaitForMultipleObjects, plus a high percentage of the sequence openOptionsDialog in mailcore.js, gadvancedpane.showconnections, opensubdialog Other network performance bugs: https://mzl.la/2iXYLiS [1] Ryan's profiles https://cleopatra.io/#report=ee827feb030825119dc2230f1109b1ad88bafd0b https://cleopatra.io/#report=9e5a3847d8585a5a23c606c37ed24f43e2a36ac7 https://cleopatra.io/#report=ee827feb030825119dc2230f1109b1ad88bafd0b Note: I'm doubting bug 1249945 is calendar specific and I almost duped it, but presumably it's with a good network so keeping it open for now.
Comment 33•8 years ago
|
||
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #32) > Ryan, what AV and firewall were you running with Windows 7? And now with > windows 10? > > > > I wonder why this bug is still considered unconfirmed, since this bug is open since 2011. > > Because of comment 10, and because we don't know the source of CPU usage and > what's happening in networking. Plus the time difference between your > comments in Ryan's initial bug report (plus conditions to reproduce) I'm not > convinced your netowrk issue is the same as Ryan's. Only further analysis > will tell. But you certainly should be good with bug 1107251 and bug 919485 > > That said, > a) there is bug 76473 (filed roughly in same time frame) and correction, bug 764731
Comment 34•8 years ago
|
||
Thanks Wayne for answering my question regarding UNCONFIRMED. I just noticed that in my list of profiles (given in comment 30), which you quoted in comment 32 as [1], was a duplicate, while another one (given in comment 31) was missing, so in my view the list actually is: https://cleopatra.io/#report=ee827feb030825119dc2230f1109b1ad88bafd0b https://cleopatra.io/#report=9e5a3847d8585a5a23c606c37ed24f43e2a36ac7 https://cleopatra.io/#report=12f8c4d34482d37a5184eace3ad40ded9135cf69
Comment 35•8 years ago
|
||
When confirming that the issues I mentioned in comment 30 still hold for TB 45.5.1, I found that the high CPU load (around 40% of one core) can not only be reproduced by setting a non-existing proxy IP address as the SOCKS host, say: 1.2.3.4, but also more directly by setting the mail server name to an unreachable IP address such as 1.2.3.4. Then try receiving new emails, (e.g., by clicking on an IMAP folder, or using the Get All New Messages button). The load is high as long as the green wheel rotates. I also found that when a configured SOCKS proxy is unreachable, the timeout (after which CPU load drops again) is some 60 seconds, while the value I set for mail.server.server[n].timeout, namely 30 seconds, is not respected. On the other hand, when the IMAP server itself is unreachable (regardless whether TLS is enabled or not), the timeout occurs after some 125 seconds.
Comment 36•8 years ago
|
||
The different timeouts observed not only indicate that there is even more than one nasty busy loop somewhere down in the network layer, but also should be of good help spotting them. I presume that they even use hard-coded timeout values.
Reporter | ||
Comment 37•8 years ago
|
||
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #32) > Ryan, what AV and firewall were you running with Windows 7? And now with > windows 10? My Windows 7 setup had neither installed (they broke cygwin). My Windows 10 setup had Defender for quite a while, until it started breaking my backup software a couple months ago. Now it's disabled as well. I occasionally fire up AV software to check for problems and have not found any, so I don't think that's the cause. In case it's relevant, my Windows 7 setup had a bog-standard Windows VPN connection that I used occasionally, and my Windows 10 setup has an OpenVPN, also used occasionally. A few months ago a Cisco VPN joined the mix, but I very rarely use it.
Comment 38•6 years ago
|
||
This bug has been reported meanwhile 7 years ago, still marked as unconfirmed, let alone fixed. Though it has been discussed and confirmed by several people and I even spent the effort to provide profiles. Since it is my typical - and pretty frustrating - experience with Mozilla Thunderbird that (older) bugs get neglected after a while I've just done further experiments with the latest TB version 52.9.1 and filed the still existing issue as new bug report: Bug 1488092.
Comment 39•6 years ago
|
||
(In reply to Ryan Johnson from comment #37) > (In reply to Wayne Mery (:wsmwk, NI for questions) from comment #32) > > Ryan, what AV and firewall were you running with Windows 7? And now with > > windows 10? > > My Windows 7 setup had neither installed All imap accounts? Does this reproduce with Windows started in safe mode? (I'm surprised I didn't ask before) https://support.microsoft.com/en-us/help/12376/windows-10-start-your-pc-in-safe-mode
Comment 40•6 years ago
|
||
Also, to what extent does CPU usage change if you disable (hide) start bar using View > toolbar > status bar? (preferably with version 60)
Reporter | ||
Comment 41•6 years ago
|
||
I just put my laptop in airplane mode and hit "Get Messages", which put CPU util at ~40% of one CPU. That might be an improvement--I think it used to be more like 80%--but it's certainly not remotely good. Then again, Thunderbird often sucks down 10-20% CPU at any given moment even when seemingly sitting idle, so yeah... CPU hog all around. Tho of course now that I'm typing this it decided to drop to 0% CPU for once. It seems to use the most CPU when any part of the window is visible (even if not in the foreground). Some of the drop vs. before might be due to me dropping down to just two accounts (instead of four). And yes, all imap accounts. I haven't tried safe mode yet, it would be rather disruptive to my daily workflow. I only have 52.9.1, which "Help -> about" reports as latest version for release channel? You know, hidden status bar might actually reduce CPU util a fair bit, for both idle and bad-network scenarios. Happy to leave that off, I don't think I ever use it...
Comment 42•6 years ago
|
||
(In reply to Ryan Johnson from comment #41) > I just put my laptop in airplane mode and hit "Get Messages", which put CPU > util at ~40% of one CPU. That might be an improvement--I think it used to be > more like 80%--but it's certainly not remotely good. That's good to hear > Then again, Thunderbird often sucks down 10-20% CPU at any given moment even when seemingly sitting > idle, so yeah... CPU hog all around. Tho of course now that I'm typing this > it decided to drop to 0% CPU for once. It seems to use the most CPU when any > part of the window is visible (even if not in the foreground). Some of the > drop vs. before might be due to me dropping down to just two accounts (instead of four). > > And yes, all imap accounts. I haven't tried safe mode yet, it would be > rather disruptive to my daily workflow. If the Thunderbird issue is highly reproducible, it should only take few minutes, and would help eliminate external factors that are currently unknowable. > I only have 52.9.1, You can download 60 from https://www.thunderbird.net/en-US/ If you have add-ons, list them first here so we can assess whether you would be impacted. > You know, hidden status bar might actually reduce CPU util a fair bit, for > both idle and bad-network scenarios. Happy to leave that off, Please quantify the difference with it on, and with it off. I suggest set windows' taskmanager View > Update Speed to Low, which will flatten your performance graph.
Comment 43•6 years ago
|
||
(In reply to Ryan Johnson from comment #41) > I just put my laptop in airplane mode and hit "Get Messages", which put CPU > util at ~40% of one CPU. That might be an improvement--I think it used to be > more like 80%--but it's certainly not remotely good. [...] Some of the > drop vs. before might be due to me dropping down to just two accounts > (instead of four). I suspect that the reduction of TB's CPU misuse by 50% in your case is not due to a bug having been fixed meanwhile but simply because you reduced the number of accounts by 50%. > And yes, all imap accounts. I haven't tried safe mode yet, it would be > rather disruptive to my daily workflow. > > I only have 52.9.1, which "Help -> about" reports as latest version for > release channel? > > You know, hidden status bar might actually reduce CPU util a fair bit, for > both idle and bad-network scenarios. Happy to leave that off, I don't think > I ever use it... As mentioned several times on Bugzilla for this and related bugs, Wayne Mery and any anyone else could easily reproduce the issue himself for doing these (and any further) tests of interest in order to narrow down the search space: Just use a non-existent IP address like 1.2.3.4 as the server address.
Comment 44•6 years ago
|
||
Why does this bug still have status UNCONFIRMED?
Comment 45•6 years ago
|
||
BTW, my workaround (when my laptop is on battery) is: pssuspend thunderbird
Comment 46•6 years ago
|
||
I cannot reproduce with nobody@1.2.3.4 using windows 7, thinkpad, i7, onboard wireless 6300 AGN, thunderbird 60.0b11 with status bar visible. CPU varies between 1% and 3% for 15 seconds.
Comment 47•6 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #46) > I cannot reproduce with nobody@1.2.3.4 using windows 7, thinkpad, i7, > onboard wireless 6300 AGN, thunderbird 60.0b11 with status bar visible. CPU > varies between 1% and 3% for 15 seconds. I was surprised that in your case the increase in CPU load is pretty moderate. Still you also get an undue extra load until the connection attempt times out, though it is less prominent than in my case. Do you use 32- or 64-bit Thunderbird? How many cores does your i7 have? It looks like the more cores are available, the less percentage is reported by Microsoft's task manager, which apparently normalizes the total load of all cores to max. 100% while on Linux the load of each core is reported up to 100%. I've just tried again on my Win10 laptop with 4 cores, this time with a new TB profile and 32- and 64-bit TB 60.0.b11. Switching back to the latest current release 52.9.1 did not change anything. So the TB version and 32 vs. 64 bit makes no difference. In all these cases I get some 3% extra load - still too much. With my normal profile (having two accounts and storing some 4 GB of emails) the extra CPU load was higher (as I wrote, up to 10%) but currently I get less extra load: 5-6%. Interesting that the extra load is higher than with a (nearly) empty profile. To sum up, the undue extra load varies depending on various factors. In my case it is between 12 and 40% per core. For Wayne Mery it appears to be much less (which might be explained by having more cores and the Windows way of normalizing CPU load figures), while for Ryan Johnson it is around 40% per core.
Comment 48•6 years ago
|
||
Oops, my formulation "per core" was misleading. What I meant is: "for one core".
Comment 49•6 years ago
|
||
I just found a good potential explanation why the effect of this bug is less noticeable for Wayne Mery: when the "Main Toolbar" is disabled, the undue extra CPU load is much less, about half compared to the situation where the rotating blue circle is not visible. Wayne, can you confirm this?
Comment 50•6 years ago
|
||
Here are two related, but certainly different bugs: Under certain circumstances the undue extra CPU load is not terminated when the (configured) timeout passes, and when the "Main Toolbar" is visible, the blue circle keeps rotating indefinitely. Moreover, most times the "Stop the current transfer" button has no effect (even with the latest TB 60.0.b11).
Comment 51•6 years ago
|
||
Ryan Johnson, can you confirm that the visibility of the (View -> Toolbar -> Main Toolbar makes a big difference on the undue CPU load?
Reporter | ||
Comment 52•6 years ago
|
||
(In reply to David von Oheimb from comment #51) > Ryan Johnson, can you confirm that the visibility of the (View -> Toolbar -> > Main Toolbar makes a big difference on the undue CPU load? Yes, it seems to cut it in half, give or take. I didn't turn it back on after you suggested it a few days ago. BTW, my reported task manager CPU load is probably higher because my laptop is several years old and only has two cores.
Comment 53•6 years ago
|
||
(In reply to Ryan Johnson from comment #52) > (In reply to David von Oheimb from comment #51) > > can you confirm that the visibility of the (View -> Toolbar -> Main Toolbar makes a big difference on the undue CPU load? > > Yes, it seems to cut it in half, give or take. I didn't turn it back on after you suggested it a few days ago. > > BTW, my reported task manager CPU load is probably higher because my laptop is several years old and only has two cores. Thanks Ryan - this confirms my conjectures that * the CPU usage figures reported on a Windows system needs to be multiplied by the number of cores in order to determine the actual load (for the core assigned to Thunderbird) and that * 50% of the undue CPU load are caused by rendering the rotating blue circle. As mentioned, the extra CPU load can be quite a waste of battery capacity in case of frequent connection attempts and/or long connection timeouts (or in case the timeout is ignored under some circumstances, which must be due to the related bug I mentioned above).
Comment 54•6 years ago
|
||
(Qiyao is no longer with us) Please try the following. Put .progressmeter-statusbar { display: none !important; } .tab-throbber { display: none !important; } into <profile>\chrome\userChrome.css If the folder and/or file does not exist, create it. How does this affect your cpu usage?
Comment 55•6 years ago
|
||
workaround |
Thanks Wayne for this very helpful hint! Disabling display of these two items via CSS did not yet remove (all) the extra CPU load, but in conjunction with removing the Activity Indicator (the rotating light blue circle) from the Mail Toolbar (right-clicking on it -> Customize -> dragging the icon down into the window that just opened) it does :) In fact, this combination of workarounds makes Thunderbird usable again on my laptop. Since I did not anymore have the hope that the various TB bugs I had reported or contributed to their discussion ever get taken serious and fixed I was already considering giving up on Thunderbird (at least on my laptop) and had already started using eM Client.
Comment 57•6 years ago
|
||
Could you please mark this bug as confirmed.
Comment 58•6 years ago
|
||
Ryan, To what extent does comment 54 change your CPU usage for this bug and bug 1249945?
Comment 59•6 years ago
|
||
@Ryan, maybe you overlooked the new question to you of Sep 9th?
Reporter | ||
Comment 60•6 years ago
•
|
||
(In reply to Wayne Mery (:wsmwk) from comment #58) > Ryan, > To what extent does comment 54 change your CPU usage for this bug and bug 1249945? (In reply to David von Oheimb from comment #59) > @Ryan, maybe you overlooked the new question to you of Sep 9th? Sorry, I did miss this. Getting rid of the main toolbar made CPU usage much more manageable, and then Teh Busy hit. I just tried out the css fix as well, not sure the CPU situation changed much. Basically, any time TB is in the foreground, it takes 50-100% of a CPU... but at least now its CPU usage drops to ~0% after a few seconds when it's not in the foreground. I still close TB when on an airplane that lacks in-seat power (no internet anyway), but it's definitely nowhere near as bad as it used to be.
Comment 61•6 years ago
|
||
I just did one more experiments with the TB installation on my Linux box where my profile contains 10 email accounts. After hitting the "Get All New Messages" button, the CPU load briefly spiked to 100% and then was stuck at 30% (of one core). After choosing "Work Offline", the load went down to some 3%. When re-enabling online state, CPU load stayed low. After hitting the "Get All New Messages" button again, there was high CPU load just for a few seconds before returning to low.
Comment 62•6 years ago
|
||
important |
Here is one more experiment clearly indicating that in fact the DOM element with Id #statusbar-icon of class .progressmeter-statusbar (full path: window#messengerWindow statusbar#status-bar.chromeclass-status hbox#statusTextBox statusbarpanel#statusbar-progresspanel.statusbarpanel-progress progressmeter#statusbar-icon.progressmeter-statusbar) and not an element of class .tab-throbber has a pretty bad effect on CPU load: 1. Set the "Connection security" settings of a Gmail account to "STARTTLS" (rather than the correct value "SSL/TLS"). 2. Try getting emails from this account. 3. Until the connection times out, the status bar moves forth and back and CPU load is some 30% (of one core). 4. Add to <profile>/chrome/userChrome.css (similarly to what Wayne suggested above): #statusbar-icon { display: none; } or do this modification via the Developer Toolbox. 5. Try getting emails from this account again. 6. No moving status bar appears and CPU load remains low.
Comment 63•6 years ago
|
||
Again, why is this bug still marked UNCONFIRMED?!?
Updated•5 years ago
|
Comment 64•5 years ago
|
||
The profiler is working again. Please run the performance profiler:
- Use Thunderbird 68 or newer - release or beta
- Install profiler add-on into thunderbird - get the add-on file from https://github.com/firefox-devtools/Gecko-Profiler-Addon/blob/master/gecko_profiler.xpi?raw=true and in Tools > add-ons click the gear to install add-on from file
- Follow instructions at https://profiler.firefox.com/ Also see videos based on Firefox, but applicable to Thunderbird.
- Create a profiler URL and post it here, along with a description of what/how you tested.
Comment 65•4 years ago
|
||
Still needs profile, but let's start fresh when version 78 comes out
Comment 66•4 years ago
|
||
You can use the version 78 profile directions at https://github.com/thunderbird-conversations/thunderbird-conversations/wiki/Profiling-Conversation's-Performance
Comment 67•4 years ago
|
||
no one using version 78 where the profiler works?
Comment 68•4 years ago
|
||
Thomas, can you devise a solution by correlating comment 55 and David's work in comment 62?
Comment 69•4 years ago
|
||
Wayne, pleased to see that according to the last comment here (of two months ago) you went after this issue recently.
Comment 70•4 years ago
|
||
(In reply to David von Oheimb from comment #69)
Wayne, pleased to see that according to the last comment here (of two months ago) you went after this issue recently.
yes, I am aware
Comment 72•3 years ago
|
||
(In reply to David von Oheimb from comment #62)
Here is one more experiment clearly indicating that in fact the DOM element
with Id #statusbar-icon of class .progressmeter-statusbar [...]
and not an element of class .tab-throbber has a pretty bad effect on CPU
load:
It's worth noting that XUL <progressmeter> has since been replaced with HTML <progress> in bug 1499593, which may have a positive effect here.
When I once tested this a while back, iirc the CPU load was more around 5%, and hiding status bar <progress> did not seem to significantly change that, maybe 2% down (well, that would be a 50% decrease in absolute terms...).
- No moving status bar appears and CPU load remains low.
Maybe moving as in "oscillating green" is the key here. That status bar <progress> thing gets updated very frequently in the weirdest of ways, where percentage-based progress of some actions is combined with other general progress actions, and things which don't have measurable progress cause the back and forth motion, and the overall progress of all those different actions is supposed to end up in a single <progress> element. Not really surprising if that goes haywire and drives up CPU.
Comment 73•3 years ago
|
||
The issue persists. Here is a new profile confirming it: https://share.firefox.dev/3dILA3K
This profile has been recorded with TB 91.4.0 where 1.2.3.4 has been set up as IMAP "server name"
and pressing "Get Messages" about 1 second after starting the recording.
As mentioned already years ago at several places on Bugzilla, there is a simple way of reproducing the issue,
not only for IMAP but for various types of connections: let TB or FF connect to a "non-existing" IP address, such as 1.2.3.4.
Comment 74•3 years ago
|
||
BTW, the issue is of course not only with IP address 1.2.3.4.
It occurs also with real servers that currently are not reachable.
In my case it is the mail server of my company, while my machine is not connected to the company's intranet;
which results in this case in a DNS resolution failure for the mail server host name.
Here is the performance profile: https://share.firefox.dev/3oJHezo
Comment 75•3 years ago
|
||
Argh, it turns out that most of the performance profiles I shared recently were not actually recorded in safe mode,
and the reason is that if I start Thunderbird with "--ProfileManager --safe-mode" via my usual shell script, the "--safe-mode" option gets ignored :-(
Yet another TB bug 1745570.
Comment 76•3 years ago
|
||
Anyway, the observed misbehavior is clearly independent of safe mode.
For the case you don't believe my judgment, here is a new profile witnessing this: https://share.firefox.dev/3GAPp7u
Here is one more weird thing (again independent of safe mode):
When I disable automatic fetching mails from the unreachable IMAP server, the TB idle CPU load reduces to "just" some 12% (which is still too much).
As soon as I start recording a performance profile, the CPU load rises again to some 70% while TB should be idle.
This seems to defeat the very purpose of performance profiling.
Anyway, here is the profile obtained this way: https://share.firefox.dev/3oHOyM8
Comment 77•3 years ago
|
||
BTW, it turns out that Evolution has the same bug. https://gitlab.gnome.org/GNOME/evolution/-/issues/1741
Comment 78•3 years ago
|
||
Come on guys.
This bug report is meanwhile 10+ years old and (correctly) marked a critical defect.
But still I don't see even an attempt to actually fix it.
Comment 79•3 years ago
|
||
Things have worsened. I currently get 110% CPU usage while TB should be just idle.
Here is a fresh 'performance' profile for this, again on Linux with TB 91.4.0 : https://share.firefox.dev/3e8K6Qj
Reporter | ||
Comment 80•3 years ago
|
||
I can confirm the problem still exists in TB 91.5.1 (64-bit, Mac). Same symptom -- an endless progress bar at the bottom of the window (sitting at apparently 100%), continuously consuming 20-40% of a core. The network is up, I'm using a remote connection continuously while this happens.
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Comment 81•3 years ago
|
||
(sorry for the close/reopen -- bugzilla closed the bug automatically when I saved my previous comment)
Comment 82•1 year ago
|
||
Shame that this bug, which meanwhile is 12+ years old, is still not properly tackled and fixed.
Bugs are not fixed by neglecting them or managing/discussing them forth and back,
but by understanding what the actual issue is (in this case, pretty sure a busy waiting loop)
and some developer getting his/her hands dirty and doing something about it.
As I mentioned many times, both within this bug report and several related ones, the issue is easily reproducible
by configuring the mail server name to an unreachable IP address such as 1.2.3.4 and then trying to connect.
Comment 83•1 year ago
|
||
Related bugs include https://bugzilla.mozilla.org/show_bug.cgi?id=1107251
and https://bugzilla.mozilla.org/show_bug.cgi?id=1830641
Comment 84•1 year ago
|
||
My CPU usage is only 30-40%, but I see the same function (PollWrapper) being called again and again.
- My Firefox profile: https://share.firefox.dev/45LuGv0
We overwrite the poll function with our own PollWrapper via g_main_context_set_poll_func in widget/gtk/nsAppShell.cpp#317.
Comment 85•1 year ago
|
||
More observations:
The call stack getting repeated:
#0 PollWrapper(_GPollFD*, unsigned int, int) (aUfds=0x7f3ae50145e0, aNfsd=5, aTimeout=0) at /home/user/dev/gecko-thunderbird/widget/gtk/nsAppShell.cpp:60
#1 0x00007f3b07e74a9f in () at /usr/lib/libglib-2.0.so.0
#2 0x00007f3b07e15032 in g_main_context_iteration () at /usr/lib/libglib-2.0.so.0
#3 0x00007f3aff62f3af in nsAppShell::ProcessNextNativeEvent(bool) (this=<optimized out>, mayWait=<optimized out>) at /home/user/dev/gecko-thunderbird/widget/gtk/nsAppShell.cpp:422
#4 0x00007f3aff5a68a7 in nsBaseAppShell::DoProcessNextNativeEvent(bool) (this=this@entry=0x7f3af4fee880, mayWait=false) at /home/user/dev/gecko-thunderbird/widget/nsBaseAppShell.cpp:131
#5 0x00007f3aff5a6b10 in nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool) (this=0x7f3af4fee880, thr=0x7f3b0af3c3c0, mayWait=<optimized out>) at /home/user/dev/gecko-thunderbird/widget/nsBaseAppShell.cpp:250
#6 0x00007f3aff5a6ca1 in non-virtual thunk to nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool) () at /home/user/dev/gecko-thunderbird/widget/nsBaseAppShell.cpp:287
#7 0x00007f3afbdd8907 in nsThread::ProcessNextEvent(bool, bool*) (this=0x7f3b0af3c3c0, aMayWait=false, aResult=0x7fffcdf17347) at /home/user/dev/gecko-thunderbird/xpcom/threads/nsThread.cpp:1154
#8 0x00007f3afbddc894 in NS_ProcessNextEvent(nsIThread*, bool) (aThread=0x7f3ae50145e0, aThread@entry=0x7f3b0af3c3c0, aMayWait=false) at /home/user/dev/gecko-thunderbird/xpcom/threads/nsThreadUtils.cpp:479
#9 0x00007f3afc6570df in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (this=0x7f3af1e29340, aDelegate=0x7f3b0af1bd40) at /home/user/dev/gecko-thunderbird/ipc/glue/MessagePump.cpp:85
#10 0x00007f3afc5cce35 in MessageLoop::RunHandler() (this=0x7f3b0af1bd40) at /home/user/dev/gecko-thunderbird/ipc/chromium/src/base/message_loop.cc:361
#11 MessageLoop::Run() (this=0x7f3b0af1bd40) at /home/user/dev/gecko-thunderbird/ipc/chromium/src/base/message_loop.cc:343
#12 0x00007f3aff5a68f3 in nsBaseAppShell::Run() (this=0x7f3af4fee880) at /home/user/dev/gecko-thunderbird/widget/nsBaseAppShell.cpp:148
#13 0x00007f3b00dd2c5a in nsAppStartup::Run() (this=0x7f3af1e63ba0) at /home/user/dev/gecko-thunderbird/toolkit/components/startup/nsAppStartup.cpp:295
#14 0x00007f3b00ee1233 in XREMain::XRE_mainRun() (this=this@entry=0x7fffcdf17668) at /home/user/dev/gecko-thunderbird/toolkit/xre/nsAppRunner.cpp:5659
#15 0x00007f3b00ee1da2 in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) (this=this@entry=0x7fffcdf17668, argc=argc@entry=4, argv=argv@entry=0x7fffcdf18968, aConfig=...)
at /home/user/dev/gecko-thunderbird/toolkit/xre/nsAppRunner.cpp:5859
#16 0x00007f3b00ee2232 in XRE_main(int, char**, mozilla::BootstrapConfig const&) (argc=4, argv=0x7fffcdf18968, aConfig=...) at /home/user/dev/gecko-thunderbird/toolkit/xre/nsAppRunner.cpp:5915
#17 0x000056024249eb41 in do_main(int, char**, char**) (argc=4, argv=0x7fffcdf18968, envp=<optimized out>) at /home/user/dev/gecko-thunderbird/comm/mail/app/nsMailApp.cpp:229
#18 main(int, char**, char**) (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/user/dev/gecko-thunderbird/comm/mail/app/nsMailApp.cpp:386
The hot loop is at #5 https://searchfox.org/mozilla-central/rev/27e4816536c891d85d63695025f2549fd7976392/widget/nsBaseAppShell.cpp#248-251
do {
mLastNativeEventTime = now;
keepGoing = DoProcessNextNativeEvent(false);
} while (keepGoing && ((now = PR_IntervalNow()) - start) < limit);
Pernosco: https://pernos.co/debug/_ddLRlZf7oD5hXhB0Wlfjg/index.html
Comment 86•1 year ago
|
||
It was worked on 10 years ago: Bug 930793, but backed out in the end for causing performance regressions: https://searchfox.org/mozilla-central/diff/c3dca4be1e54b90f31c44755a984a9c4a9458a23/widget/nsBaseAppShell.cpp#278. And worked on again in Bug 1260070.
Olli, since you've worked twice on that code, do you have any insights on how best to resolve this? We probably need to remove the busy loop without causing perf regressions.
Comment 87•1 year ago
|
||
What is Necko (assuming this is a necko issue) doing to keep the main thread of the parent process so busy? Or rather, not the main thread of Gecko, but the OS level event queue/loop.
Does it trigger something on the OS side which then triggers appshell to run all the time?
https://bugzilla.mozilla.org/show_bug.cgi?id=1804295 is where the performance mode was removed.
Comment 88•1 year ago
|
||
But do you have a performance profile for this? Appshell is supposed to be high up there, if there are lots of tasks.
Thunderbird does have that one issue where it keeps re-styling something all the time. Switching tab to a calendar tab and back fixes that.
Comment 89•1 year ago
•
|
||
This is a performance profile on the latest commit: https://share.firefox.dev/3MQJiR6. It does have Appshell in the top.
@kershaw Can you answer Olli's question regarding Necko?
(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #87)
What is Necko (assuming this is a necko issue) doing to keep the main thread of the parent process so busy? Or rather, not the main thread of Gecko, but the OS level event queue/loop.
Does it trigger something on the OS side which then triggers appshell to run all the time?
Comment 90•1 year ago
|
||
The main thread is mostly idle in that profile. The poll just tells that the thread is waiting for more work, but if you zoom in to the main thread you can see that there is all the time refreshdriver ticking and then some idle time between. So this might be a Thunderbird frontend issue. Why is it triggering a refreshdriver tick all the time?.
This is the type of issue I mention in my comment and what I see every now and then.
Comment 91•1 year ago
|
||
Will try to capture a profile where this happens.
Comment 92•1 year ago
|
||
(In reply to Manuel Bucher [:manuel] from comment #89)
This is a performance profile on the latest commit: https://share.firefox.dev/3MQJiR6. It does have Appshell in the top.
@kershaw Can you answer Olli's question regarding Necko?
(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #87)What is Necko (assuming this is a necko issue) doing to keep the main thread of the parent process so busy? Or rather, not the main thread of Gecko, but the OS level event queue/loop.
Does it trigger something on the OS side which then triggers appshell to run all the time?
From this profile, I don't see any sign of networking problem.
We should also include socket thread in the profiler if we suspect this is a networking issue.
Comment 93•1 year ago
|
||
(In reply to Andrew Creskey [:acreskey] from comment #91)
Will try to capture a profile where this happens.
reminder ...
(In reply to Kershaw Chang [:kershaw] from comment #92)
...
We should also include socket thread in the profiler if we suspect this is a networking issue.
Updated•11 months ago
|
Comment 94•6 months ago
|
||
Need a profile with Socket Thread to action on it. I will create one next week.
Comment 95•6 months ago
•
|
||
Created a profile with socket thread. Kershaw, can you take a look? https://share.firefox.dev/3Tc3VvA
Maybe cleaner profile (containing only the error state): https://share.firefox.dev/3uVtsQ7
Using the Thunderbird preset: https://share.firefox.dev/3wxnZ2u
Comment 96•6 months ago
|
||
The profiles seem to suggest that most of CPU resource is used in DOM or JS code, rather than network activities.
This makes me believe that comment #90 might be still valid - this looks like a Thunderbird front end issue.
Andrew, if you have time, please also look profiles in comment #95 and see if my conclusion is correct. Thanks.
Comment 97•6 months ago
|
||
(In reply to Kershaw Chang [:kershaw] from comment #96)
The profiles seem to suggest that most of CPU resource is used in DOM or JS code, rather than network activities.
This makes me believe that comment #90 might be still valid - this looks like a Thunderbird front end issue.Andrew, if you have time, please also look profiles in comment #95 and see if my conclusion is correct. Thanks.
Absolutely agree with that conclusion.
Socket thread is just waiting while the Thunderbird front end is perptually hard at work on JS.
Updated•5 months ago
|
Comment 98•6 days ago
|
||
Another profile recorded on a new profile with no messages: https://share.firefox.dev/3WJDpu1 For some reason much time is spend on message indexing.
Description
•