[go: nahoru, domu]

Open Bug 683651 Opened 13 years ago Updated 6 days ago

High CPU utilization while sending or receiving over slow network. (WFM in version 17? Bad again in TB24?) - may involve oscillating <progress> element in status bar

Categories

(MailNews Core :: Networking, defect)

x86_64
Windows 7
defect
Not set
critical

Tracking

(Not tracked)

REOPENED

People

(Reporter: scovich, Unassigned)

References

(Depends on 2 open bugs, Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: [needs profile][needs protocol log][workaround: comment 55])

Checking email over a slow/overloaded wireless connection can take 90 seconds or longer, with CPU usage hovering at 40-80%. This destroys my laptop's battery life, especially if a connection times out; sometimes I have to close TB to make it stop burning CPU. The same can happen if an email server is down or not responding properly.
I just noticed: while submitting this bug over the same slow/overloaded wireless connection mentioned above, Firefox displayed the same behavior (40-80% cpu util while waiting for a response from bugzilla). Perhaps this bug is common to both?
(In reply to Ryan Johnson from comment #0)
> Checking email over a slow/overloaded wireless connection can take 90
> seconds or longer, with CPU usage hovering at 40-80%.

SSL(pop3/imap/smtp with SSL) only problem? No problem if non-SSL?

(In reply to Ryan Johnson from comment #1)
> I just noticed: while submitting this bug over the same slow/overloaded
> wireless connection mentioned above, Firefox displayed the same behavior
> (40-80% cpu util while waiting for a response from bugzilla).

Submitting of bug to bugzilla.mozilla.org is upload of data by POST via HTTPS:(HTTP with SSL, sender of TCP=Fx, receiver of TCP=server when POST process).
No problem if non-SSL site(HTTP:)?

It may be frequent packet loss followed by re-transmission.
When(with which build of Tb/Fx, specific date, after upgrade, ...) did your problem start to occur?
Can small network.tcp.sendbuffer(default=131072) such as 4096, 8192 be a workaround for bug submitting or comment posting at bugzilla.mozilla.org?
Keywords: perf
Ryan, please reply to comment 2. Thanks
Whiteboard: [closeme 2011-10-01]
(In reply to WADA from comment #2)
> (In reply to Ryan Johnson from comment #0)
> > Checking email over a slow/overloaded wireless connection can take 90
> > seconds or longer, with CPU usage hovering at 40-80%.
> 
> SSL(pop3/imap/smtp with SSL) only problem? No problem if non-SSL?
All four of my accounts are SSL, but I can't disable it verify that non-SSL works better.

> (In reply to Ryan Johnson from comment #1)
> > I just noticed: while submitting this bug over the same slow/overloaded
> > wireless connection mentioned above, Firefox displayed the same behavior
> > (40-80% cpu util while waiting for a response from bugzilla).
> 
> Submitting of bug to bugzilla.mozilla.org is upload of data by POST via
> HTTPS:(HTTP with SSL, sender of TCP=Fx, receiver of TCP=server when POST
> process).
> No problem if non-SSL site(HTTP:)?
I'll have to get back to you if I see a non-SSL upload taking a long time -- it doesn't happen to me that much.
 
> It may be frequent packet loss followed by re-transmission.
> When(with which build of Tb/Fx, specific date, after upgrade, ...) did your
> problem start to occur?
I noticed just during the last month or so when my normally cool laptop started cooking my legs. Whether the problem existed before in less-blistering form I couldn't say.

> Can small network.tcp.sendbuffer(default=131072) such as 4096, 8192 be a
> workaround for bug submitting or comment posting at bugzilla.mozilla.org?
I changed it and will let you know after posting this. Do I need to restart FF first?
Update: I still saw 40% cpu util during the previous upload. Trying again after restarting FF.
Update: It looks like the setting didn't stick until I restarted. With buffer size 4096 CPU util during the comment upload was no more than 20%, a significant improvement. Is there some similar setting in TB that would have an equivalent effect?
(In reply to Ryan Johnson from comment #6)
> Is there some similar setting in TB that would have an equivalent effect?

Tb also uses same setting. Go Tools/Options/Advanced/General, Config Editor.

> With buffer size 4096 CPU util during the comment upload was no more than 20%,

How about network.tcp.sendbuffer=65536?
(See Bug 541367 and bugs listed in Dependency tree for that bug, with "Show Resolved", please)
As network.tcp.sendbuffer=4096 is too small usually(causes inefficient network resource use), data transmission will take longer than usual. Please find most appropriate value in your environment, please.
(In reply to WADA from comment #7)
> (In reply to Ryan Johnson from comment #6)
> > Is there some similar setting in TB that would have an equivalent effect?
> 
> Tb also uses same setting. Go Tools/Options/Advanced/General, Config Editor.
Yes, but it's downloading, not uploading... 

> 
> > With buffer size 4096 CPU util during the comment upload was no more than 20%,
> 
> How about network.tcp.sendbuffer=65536?
> (See Bug 541367 and bugs listed in Dependency tree for that bug, with "Show
> Resolved", please)
> As network.tcp.sendbuffer=4096 is too small usually(causes inefficient
> network resource use), data transmission will take longer than usual. Please
> find most appropriate value in your environment, please.
It's really starting to sound like this is just a hack to work around the real (still unknown?) problem, which doesn't make me very enthusiastic about going further in this direction. Perhaps you could explain why tcp buffer size should impact CPU utilization so strongly?
For comparison, I just tried loading this web page from IE9, and it has exactly *zero* cpu util during the wait, then spikes just long enough to page render the page.
(In reply to Ryan Johnson from comment #8)
> (In reply to WADA from comment #7)
> > As network.tcp.sendbuffer=4096 is too small usually(causes inefficient
> > network resource use), data transmission will take longer than usual. Please
> > find most appropriate value in your environment, please.
> It's really starting to sound like this is just a hack to work around the
> real (still unknown?) problem, which doesn't make me very enthusiastic about
> going further in this direction. Perhaps you could explain why tcp buffer
> size should impact CPU utilization so strongly?

because if the network is flakey (which includes faulty hardware) it will cause packets to be retransmitted, and thus drive up CPU

what make and model wireless router are you running?

Also, how are you determining that you have "slow/overloaded wireless connection"?
See Also: → 686495
Whiteboard: [closeme 2011-10-01]
(In reply to Wayne Mery (:wsmwk) from comment #10)
> because if the network is flaky (which includes faulty hardware) it will
> cause packets to be retransmitted, and thus drive up CPU

Bug 475603 - Lots of timeouts for DNS requests with Netgear Router WGR614 - is one example of hardware problem (I don't recall whether it drove up CPU)
(In reply to Wayne Mery (:wsmwk) from comment #10)
> (In reply to Ryan Johnson from comment #8)
> > (In reply to WADA from comment #7)
> > > As network.tcp.sendbuffer=4096 is too small usually(causes inefficient
> > > network resource use), data transmission will take longer than usual. Please
> > > find most appropriate value in your environment, please.
> > It's really starting to sound like this is just a hack to work around the
> > real (still unknown?) problem, which doesn't make me very enthusiastic about
> > going further in this direction. Perhaps you could explain why tcp buffer
> > size should impact CPU utilization so strongly?
> 
> because if the network is flakey (which includes faulty hardware) it will
> cause packets to be retransmitted, and thus drive up CPU
Somehow IE9 manages to avoid the problem (see comment 9) while using the exact same hardware, network connection, and web page (this one), which suggests that the problem lies closer to home.

> 
> what make and model wireless router are you running?
My home router is a Cisco (at work, can't remember the model), but the exact same thing happens at school and with at least two conference hotel wireless setups on different continents. Besides, the same thing occurs when there's no router at all (wireless switched off on the bus) and the connection is just plain timing out. 

> 
> Also, how are you determining that you have "slow/overloaded wireless
> connection"?
At the time of reporting, there were 300 people in the same conference session as me, all trying to read their email at the same time; one WIFI router sat on a stand in the corner. It took some tries to even get an IP address (192.168.0.0/24).
This also happens on slow ADSL networks.

I have the same problem with Thunderbird 8.0 sending a 4-megabyte attachment
oven an ADSL (order of 100 kb/s uplink).  It takes 20 minutes to
send, and during this 20 minutes, the computer seems hogged up.

The computer should not be hogged while simply waiting for bytes to
be sent over the network card.

Thanks.
(In reply to Ryan Johnson from comment #8)
> It's really starting to sound like this is just a hack to work around the
> real (still unknown?) problem, which doesn't make me very enthusiastic about
> going further in this direction. Perhaps you could explain why tcp buffer
> size should impact CPU utilization so strongly?

No. 
It's for problem determiation.
- Even with network.tcp.sendbuffer=4096, CPU 100% still occurs,
  with SSL, SSL only problem => Bug 538283 
- With network.tcp.sendbuffer<=64KB, CPU 100% problem or connection loss
  is resolved, SSL or non-SSL is irrelevant  => Router's bug
If SSL, even when cause of CPU 100% was router's bug and resolved by network.tcp.sendbuffer<=64KB, higher CPU consumption than expected may occur due to Bug 538283. CPU utilization may be higher when network.tcp.sendbuffer=64KB than CPU utilization with network.tcp.sendbuffer=4KB if slow network. 
If Wireless network, searching appropriate network.tcp.sendbuffer value is never workaround. If probability of packet loss is high, sendbuffer size is better reduced. It's a performance tuning. 

"Do such things or not" is all up to you.
Responding to comment 2:
<<
When(with which build of Tb/Fx, specific date, after upgrade, ...) did your problem start to occur?
>>

using the set-up of comment 13:
<<
I have the same problem with Thunderbird 8.0 sending a 4-megabyte attachment
oven an ADSL (order of 100 kb/s uplink).
>>

It may have started far back in version 2.X or 3.X, even since I started to send large attachments, and found that my computer is hogged up, because I have not updated my Thunderbird (stayed at version 3.X) until recently.  And Thunderbird and Firefox picked up a bad habit of using up version numbers quickly.

Thanks.
(In reply to WADA from comment #14)
> (In reply to Ryan Johnson from comment #8)
> > It's really starting to sound like this is just a hack to work around the
> > real (still unknown?) problem, which doesn't make me very enthusiastic about
> > going further in this direction. Perhaps you could explain why tcp buffer
> > size should impact CPU utilization so strongly?
> 
> No. 
> It's for problem determiation.

[snipped lots of text about knob-turning]

I'm using a very simple problem determination process here:
- Problem occurs on a wide variety of networks (home WLAN, .edu LAN, .edu WLAN, overloaded hotel WLAN ==> probably not a router config issue (they can't all be wrong)
- Problem occurs when not connected to *any* network (if TB thinks there's connectivity) ==> probably not a buffer size issue (buffer should fill "immediately" and then stop since nothing is draining it)
- Problem also occurs in FF (CPU usage while loading from a slow web site) ==> probably an issue with shared infrastructure (xul.dll?)
- Problem does *not* occur in IE ==> not an OS config problem

Maybe it is a "simple matter of tuning" but it's not a game users should have to play... for each network... for each changing situation. Especially not when other products seem able to handle the issue without user intervention. Emphatically not if it turns out the Windows profiler is right and UI redrawing overhead is [part of] the problem (see bug #686495 for details). 

> "Do such things or not" is all up to you.
"Compete or do not" is all up to you. The bar is set, and exposing this sort of knob-foolery to users is below it.
reporter, do you still see this problem when using a current version?
Component: General → Untriaged
Whiteboard: [closeme 2012-12-10]
(In reply to Wayne Mery (:wsmwk) from comment #17)
> reporter, do you still see this problem when using a current version?

Using 16.0.2, the problem persists. Disabling all virtual network adapters used by virtual machines on my computer helps a little (they fooled TB into thinking there was always connectivity), and doing so cuts CPU usage by almost half, but I still have to commute with TB closed to conserve battery.

Steps tried:
1. Open TB
2. Connect to internet
3. Check email
4. Disable wireless
5. Check email again
6. CPU usage jumps (split between thunderbird.exe and dwm.exe) as multiple "unable to connect" windows slide across the bottom of the screen.
(In reply to Wayne Mery (:wsmwk) from comment #17)
> reporter, do you still see this problem when using a current version?

After updating to 17.0, the situation seems to be markedly improved. I'll try this out for a week or two to be sure, but this version may have fixed it for me.
(In reply to Ryan Johnson from comment #19)
> (In reply to Wayne Mery (:wsmwk) from comment #17)
> > reporter, do you still see this problem when using a current version?
> 
> After updating to 17.0, the situation seems to be markedly improved. I'll
> try this out for a week or two to be sure, but this version may have fixed
> it for me.

Still no CPU hogging troubles since the upgrade. Thanks for the fix!
Thanks for the update.
Status: UNCONFIRMED → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Whiteboard: [closeme 2012-12-10]
FWIW this would have partly helped by changes such that every gmail message is not downloaded at least twice.

Do you find version 24 be the same or better?
Flags: needinfo?(scovich)
Summary: High CPU utilization while sending or receiving over slow network → TB5 High CPU utilization while sending or receiving over slow network. WFM in version 17
I hadn't been paying attention lately, usually I just close TB if I'm going off grid (and FF as well, if I really want to maximize battery life). 

A cursory test says it's back to the original (bad) behavior. Downloading a 27MB email from the in-laws over a decent DSL connection keeps the CPU at 30-60% during the entire download.

However, it does seem to do better at handling dropped connections: turning off the wifi while an email was downloading only spiked the CPU to 100% for about 10 seconds before it gave up (instead of 60-90 seconds like before). 

Note that I'm on a new laptop, using a new WIFI router, and all TB knobs are at defaults unless they were attached to the user profile I imported from the old machine. 

(I still don't understand why waiting on the network should use more than single-digit %CPU, *especially* if no bytes are coming down the pipe. I don't have the Windows profiler handy to see where those CPU cycles are going, though, and no time to install it right now).
Status: RESOLVED → UNCONFIRMED
Resolution: WORKSFORME → ---
Flags: needinfo?(scovich)
I can confirm this bug for Thunderbird 24.3.0.
When using Thunderbird with a slow or intermittent cnonection or behind a firewall that requires a (SOCKS) proxy to connect to the mail server and the proxy is not properly configured, 
there is a high CPU load until the connection times out. Obviously there is some busy loop waiting for the connection.

Please remove the "TB5" part of the bug's subject.
According to comment 24
Summary: TB5 High CPU utilization while sending or receiving over slow network. WFM in version 17 → High CPU utilization while sending or receiving over slow network. WFM in version 17
Severity: normal → major
A profile will be helpful https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Thunderbird_Performance_Problem_with_G
Unfortunately the symbols aren't working just now.
Whiteboard: [needs profile][needs protocol log]
David, and/or Ryan?

(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #26)
> A profile will be helpful
> https://developer.mozilla.org/en-US/docs/Mozilla/Performance/
> Reporting_a_Thunderbird_Performance_Problem_with_G
Flags: needinfo?(scovich)
Flags: needinfo?(mueller8)
My Thunderbird experience is currently dominated by Bug #1249945, so I'm not able to tell whether this bug is still a problem.
Flags: needinfo?(scovich)
(In reply to Ryan Johnson from comment #28)
> My Thunderbird experience is currently dominated by Bug #1249945, so I'm not
> able to tell whether this bug is still a problem.

Ryan, thanks for the update.  Please add the profile URL in your Bug #1249945 so we can view it.
Blocks: 554898
Thanks Wayne for the link how to produce a profile. I still had some trouble with Cleopatra, but eventually it worked out: 
http://people.mozilla.org/~bgirard/cleopatra/?1457530381985#report=ee827feb030825119dc2230f1109b1ad88bafd0b
http://people.mozilla.org/~bgirard/cleopatra/?1457530740188#report=9e5a3847d8585a5a23c606c37ed24f43e2a36ac7
http://people.mozilla.org/~bgirard/cleopatra/?1457530381985#report=ee827feb030825119dc2230f1109b1ad88bafd0b
For current Thunderbird 38.6.0 on my Windows 7 machine, the problem persists:

When using Thunderbird with a slow or intermittent connection or the SOCKS proxy is not reachable or not properly configured, there is a high CPU load (some 30-40% on one of my CPU cores) until the connection times out. Obviously there is some busy loop waiting for the connection.

As mentioned already for several related Mozilla issues, such problems should be easily reproducible: set a non-existing proxy name and port as the SOCKS server, say: 1.2.3.4 and port 5, and then try to get/receive emails, (e.g., by clicking on an IMAP folder).

I wonder why this bug is still considered unconfirmed, since this bug is open since 2011.
Flags: needinfo?(mueller8)
Here's another profile, which may be related or not, witnessing needless high CPU load for current TB.
http://people.mozilla.org/~bgirard/cleopatra/?1457531717548#report=12f8c4d34482d37a5184eace3ad40ded9135cf69
Ryan, what AV and firewall were you running with Windows 7?  And now with windows 10?


> I wonder why this bug is still considered unconfirmed, since this bug is open since 2011.

Because of comment 10, and because we don't know the source of CPU usage and what's happening in networking. Plus the time difference between your comments in Ryan's initial bug report (plus conditions to reproduce) I'm not convinced your netowrk issue is the same as Ryan's. Only further analysis will tell. But you certainly should be good with bug 1107251 and bug 919485 

That said, 
a) there is bug 76473 (filed roughly in same time frame) and 
b) further analysis is needed, and what you and Ryan see with version 50 (beta) and newer would be most helpful (but I don't expect bad proxy to be any better)

Ryan's 3 performance bugs, with differing network conditions :
* v6, bug 686495, win7, no/disconnected network - no profiler run, but xperf shows CPU in graphics code
* bug 1249945, win10, good network, 800MB? - Ryan's profile run is 50% wait for NtWaitForMultipleObjects, ~50% CC (cycle collect), almost no painting CPU
* v6, this bug, win7, slow/bad network - no profile run from Ryan. David's v24 "proxy" profiles [1] are similar to bug 1249956 only in the high Nt waiting -  high painting CPU, ~40% wait for NtWaitForMultipleObjects, plus a high percentage of the sequence openOptionsDialog in mailcore.js, gadvancedpane.showconnections, opensubdialog

Other network performance bugs: https://mzl.la/2iXYLiS

[1] Ryan's profiles
https://cleopatra.io/#report=ee827feb030825119dc2230f1109b1ad88bafd0b
https://cleopatra.io/#report=9e5a3847d8585a5a23c606c37ed24f43e2a36ac7
https://cleopatra.io/#report=ee827feb030825119dc2230f1109b1ad88bafd0b


Note: I'm doubting bug 1249945 is calendar specific and I almost duped it, but presumably it's with a good network so keeping it open for now.
Blocks: 1249945
Component: Untriaged → Networking
Depends on: 686495
Flags: needinfo?(scovich)
Product: Thunderbird → MailNews Core
See Also: 686495
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #32)
> Ryan, what AV and firewall were you running with Windows 7?  And now with
> windows 10?
> 
> 
> > I wonder why this bug is still considered unconfirmed, since this bug is open since 2011.
> 
> Because of comment 10, and because we don't know the source of CPU usage and
> what's happening in networking. Plus the time difference between your
> comments in Ryan's initial bug report (plus conditions to reproduce) I'm not
> convinced your netowrk issue is the same as Ryan's. Only further analysis
> will tell. But you certainly should be good with bug 1107251 and bug 919485 
> 
> That said, 
> a) there is bug 76473 (filed roughly in same time frame) and 

correction, bug 764731
See Also: → 764731
Thanks Wayne for answering my question regarding UNCONFIRMED.

I just noticed that in my list of profiles (given in comment 30), which you quoted in comment 32 as [1], was a duplicate, while another one (given in comment 31) was missing, so in my view the list actually is:

https://cleopatra.io/#report=ee827feb030825119dc2230f1109b1ad88bafd0b
https://cleopatra.io/#report=9e5a3847d8585a5a23c606c37ed24f43e2a36ac7
https://cleopatra.io/#report=12f8c4d34482d37a5184eace3ad40ded9135cf69
When confirming that the issues I mentioned in comment 30 still hold for TB 45.5.1, I found that the high CPU load (around 40% of one core) can not only be reproduced by setting a non-existing proxy IP address as the SOCKS host, say: 1.2.3.4, but also more directly by setting the mail server name to an unreachable IP address such as 1.2.3.4.
Then try receiving new emails, (e.g., by clicking on an IMAP folder, or using the Get All New Messages button). The load is high as long as the green wheel rotates. 

I also found that when a configured SOCKS proxy is unreachable, the timeout (after which CPU load drops again) is some 60 seconds, while the value I set for mail.server.server[n].timeout, namely 30 seconds, is not respected. 
On the other hand, when the IMAP server itself is unreachable (regardless whether TLS is enabled or not), the timeout occurs after some 125 seconds.
The different timeouts observed not only indicate that there is even more than one nasty busy loop somewhere down in the network layer, but also should be of good help spotting them. I presume that they even use hard-coded timeout values.
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #32)
> Ryan, what AV and firewall were you running with Windows 7?  And now with
> windows 10?

My Windows 7 setup had neither installed (they broke cygwin). My Windows 10 setup had Defender for quite a while, until it started breaking my backup software a couple months ago. Now it's disabled as well. I occasionally fire up AV software to check for problems and have not found any, so I don't think that's the cause. 

In case it's relevant, my Windows 7 setup had a bog-standard Windows VPN connection that I used occasionally, and my Windows 10 setup has an OpenVPN, also used occasionally. A few months ago a Cisco VPN joined the mix, but I very rarely use it.
Flags: needinfo?(scovich)
This bug has been reported meanwhile 7 years ago, still marked as unconfirmed, let alone fixed.
Though it has been discussed and confirmed by several people and I even spent the effort to provide profiles.

Since it is my typical - and pretty frustrating - experience with Mozilla Thunderbird that (older) bugs get neglected after a while I've just done further experiments with the latest TB version 52.9.1 and filed the still existing issue as new bug report: Bug 1488092.
(In reply to Ryan Johnson from comment #37)
> (In reply to Wayne Mery (:wsmwk, NI for questions) from comment #32)
> > Ryan, what AV and firewall were you running with Windows 7?  And now with
> > windows 10?
> 
> My Windows 7 setup had neither installed 

All imap accounts?
Does this reproduce with Windows started in safe mode? (I'm surprised I didn't ask before)
  https://support.microsoft.com/en-us/help/12376/windows-10-start-your-pc-in-safe-mode
Flags: needinfo?(scovich)
Summary: High CPU utilization while sending or receiving over slow network. WFM in version 17 → High CPU utilization while sending or receiving over slow network. (WFM in version 17? Bad again in TB24?)
Also, to what extent does CPU usage change if you disable (hide) start bar using View > toolbar > status bar?   (preferably with version 60)
I just put my laptop in airplane mode and hit "Get Messages", which put CPU util at ~40% of one CPU. That might be an improvement--I think it used to be more like 80%--but it's certainly not remotely good. Then again, Thunderbird often sucks down 10-20% CPU at any given moment even when seemingly sitting idle, so yeah... CPU hog all around. Tho of course now that I'm typing this it decided to drop to 0% CPU for once. It seems to use the most CPU when any part of the window is visible (even if not in the foreground). Some of the drop vs. before might be due to me dropping down to just two accounts (instead of four).

And yes, all imap accounts. I haven't tried safe mode yet, it would be rather disruptive to my daily workflow.

I only have 52.9.1, which "Help -> about" reports as latest version for release channel?

You know, hidden status bar might actually reduce CPU util a fair bit, for both idle and bad-network scenarios. Happy to leave that off, I don't think I ever use it...
Flags: needinfo?(scovich)
(In reply to Ryan Johnson from comment #41)
> I just put my laptop in airplane mode and hit "Get Messages", which put CPU
> util at ~40% of one CPU. That might be an improvement--I think it used to be
> more like 80%--but it's certainly not remotely good. 

That's good to hear

> Then again, Thunderbird often sucks down 10-20% CPU at any given moment even when seemingly sitting
> idle, so yeah... CPU hog all around. Tho of course now that I'm typing this
> it decided to drop to 0% CPU for once. It seems to use the most CPU when any
> part of the window is visible (even if not in the foreground). Some of the
> drop vs. before might be due to me dropping down to just two accounts (instead of four).
> 
> And yes, all imap accounts. I haven't tried safe mode yet, it would be
> rather disruptive to my daily workflow.

If the Thunderbird issue is highly reproducible, it should only take few minutes, and would help eliminate external factors that are currently unknowable.


> I only have 52.9.1,

You can download 60 from https://www.thunderbird.net/en-US/  
If you have add-ons, list them first here so we can assess whether you would be impacted.


> You know, hidden status bar might actually reduce CPU util a fair bit, for
> both idle and bad-network scenarios. Happy to leave that off,

Please quantify the difference with it on, and with it off. 
I suggest set windows' taskmanager View > Update Speed to Low, which will flatten your performance graph.
Flags: needinfo?(scovich)
(In reply to Ryan Johnson from comment #41)
> I just put my laptop in airplane mode and hit "Get Messages", which put CPU
> util at ~40% of one CPU. That might be an improvement--I think it used to be
> more like 80%--but it's certainly not remotely good. [...] Some of the
> drop vs. before might be due to me dropping down to just two accounts
> (instead of four).

I suspect that the reduction of TB's CPU misuse by 50% in your case is not due to a bug having been fixed meanwhile but simply because you reduced the number of accounts by 50%.

> And yes, all imap accounts. I haven't tried safe mode yet, it would be
> rather disruptive to my daily workflow.
> 
> I only have 52.9.1, which "Help -> about" reports as latest version for
> release channel?
> 
> You know, hidden status bar might actually reduce CPU util a fair bit, for
> both idle and bad-network scenarios. Happy to leave that off, I don't think
> I ever use it...

As mentioned several times on Bugzilla for this and related bugs, Wayne Mery and any anyone else could easily reproduce the issue himself for doing these (and any further) tests of interest in order to narrow down the search space: Just use a non-existent IP address like 1.2.3.4 as the server address.
Why does this bug still have status UNCONFIRMED?
BTW, my workaround (when my laptop is on battery) is: 
pssuspend thunderbird
I cannot reproduce with nobody@1.2.3.4 using windows 7, thinkpad, i7, onboard wireless 6300 AGN, thunderbird 60.0b11 with status bar visible.  CPU varies between 1% and 3% for 15 seconds.
(In reply to Wayne Mery (:wsmwk) from comment #46)
> I cannot reproduce with nobody@1.2.3.4 using windows 7, thinkpad, i7,
> onboard wireless 6300 AGN, thunderbird 60.0b11 with status bar visible.  CPU
> varies between 1% and 3% for 15 seconds.

I was surprised that in your case the increase in CPU load is pretty moderate.
Still you also get an undue extra load until the connection attempt times out, though it is less prominent than in my case. 
Do you use 32- or 64-bit Thunderbird? How many cores does your i7 have?

It looks like the more cores are available, the less percentage is reported by Microsoft's task manager, 
which apparently normalizes the total load of all cores to max. 100% while on Linux the load of each core is reported up to 100%.

I've just tried again on my Win10 laptop with 4 cores, this time with a new TB profile and 32- and 64-bit TB 60.0.b11. 
Switching back to the latest current release 52.9.1 did not change anything. So the TB version and 32 vs. 64 bit makes no difference.
In all these cases I get some 3% extra load - still too much.

With my normal profile (having two accounts and storing some 4 GB of emails) the extra CPU load was higher (as I wrote, up to 10%) but currently I get less extra load: 5-6%. Interesting that the extra load is higher than with a (nearly) empty profile.

To sum up, the undue extra load varies depending on various factors. In my case it is between 12 and 40% per core.
For Wayne Mery it appears to be much less (which might be explained by having more cores and the Windows way of normalizing CPU load figures), while for Ryan Johnson it is around 40% per core.
Oops, my formulation "per core" was misleading. What I meant is: "for one core".
I just found a good potential explanation why the effect of this bug is less noticeable for Wayne Mery:
when the "Main Toolbar" is disabled, the undue extra CPU load is much less, about half compared to the situation where the rotating blue circle is not visible. Wayne, can you confirm this?
Flags: needinfo?(vseerror)
Here are two related, but certainly different bugs:

Under certain circumstances the undue extra CPU load is not terminated when the (configured) timeout passes, and when the "Main Toolbar" is visible, the blue circle keeps rotating indefinitely.

Moreover, most times the "Stop the current transfer" button has no effect (even with the latest TB 60.0.b11).
Ryan Johnson, can you confirm that the visibility of the (View -> Toolbar -> Main Toolbar makes a big difference on the undue CPU load?
(In reply to David von Oheimb from comment #51)
> Ryan Johnson, can you confirm that the visibility of the (View -> Toolbar ->
> Main Toolbar makes a big difference on the undue CPU load?

Yes, it seems to cut it in half, give or take. I didn't turn it back on after you suggested it a few days ago.

BTW, my reported task manager CPU load is probably higher because my laptop is several years old and only has two cores.
Flags: needinfo?(scovich)
(In reply to Ryan Johnson from comment #52)
> (In reply to David von Oheimb from comment #51)
> > can you confirm that the visibility of the (View -> Toolbar -> Main Toolbar makes a big difference on the undue CPU load?
> 
> Yes, it seems to cut it in half, give or take. I didn't turn it back on after you suggested it a few days ago.
> 
> BTW, my reported task manager CPU load is probably higher because my laptop is several years old and only has two cores.

Thanks Ryan - this confirms my conjectures that 
* the CPU usage figures reported on a Windows system needs to be multiplied by the number of cores in order to determine the actual load (for the core assigned to Thunderbird) and that
* 50% of the undue CPU load are caused by rendering the rotating blue circle. 

As mentioned, the extra CPU load can be quite a waste of battery capacity in case of frequent connection attempts and/or long connection timeouts (or in case the timeout is ignored under some circumstances, which must be due to the related bug I mentioned above).
(Qiyao is no longer with us)

Please try the following.  Put

.progressmeter-statusbar {
  display: none !important;
}
.tab-throbber {
  display: none !important;
}

into <profile>\chrome\userChrome.css   If the folder and/or file does not exist, create it.

How does this affect your cpu usage?
Flags: needinfo?(vseerror)
Thanks Wayne for this very helpful hint!

Disabling display of these two items via CSS did not yet remove (all) the extra CPU load,
but in conjunction with removing the Activity Indicator (the rotating light blue circle) from the Mail Toolbar (right-clicking on it -> Customize -> dragging the icon down into the window that just opened) it does :)

In fact, this combination of workarounds makes Thunderbird usable again on my laptop. Since I did not anymore have the hope that the various TB bugs I had reported or contributed to their discussion ever get taken serious and fixed I was already considering giving up on Thunderbird (at least on my laptop) and had already started using eM Client.
Could you please mark this bug as confirmed.
Ryan, 
To what extent does comment 54 change your CPU usage for this bug and bug 1249945?
Flags: needinfo?(scovich)
@Ryan, maybe you overlooked the new question to you of Sep 9th?
Depends on: 1507709
(In reply to Wayne Mery (:wsmwk) from comment #58)
> Ryan, 
> To what extent does comment 54 change your CPU usage for this bug and bug 1249945?

(In reply to David von Oheimb from comment #59)
> @Ryan, maybe you overlooked the new question to you of Sep 9th?

Sorry, I did miss this. Getting rid of the main toolbar made CPU usage much more manageable, and then Teh Busy hit.

I just tried out the css fix as well, not sure the CPU situation changed much. Basically, any time TB is in the foreground, it takes 50-100% of a CPU... but at least now its CPU usage drops to ~0% after a few seconds when it's not in the foreground.

I still close TB when on an airplane that lacks in-seat power (no internet anyway), but it's definitely nowhere near as bad as it used to be.
Flags: needinfo?(scovich)
I just did one more experiments with the TB installation on my Linux box where my profile contains 10 email accounts.

After hitting the "Get All New Messages" button, the CPU load briefly spiked to 100% and then was stuck at 30% (of one core).
After choosing "Work Offline", the load went down to some 3%. When re-enabling online state, CPU load stayed low.

After hitting the "Get All New Messages" button again, there was high CPU load just for a few seconds before returning to low.
Here is one more experiment clearly indicating that in fact the DOM element with Id #statusbar-icon of class .progressmeter-statusbar (full path: window#messengerWindow statusbar#status-bar.chromeclass-status hbox#statusTextBox statusbarpanel#statusbar-progresspanel.statusbarpanel-progress progressmeter#statusbar-icon.progressmeter-statusbar)
and not an element of class .tab-throbber has a pretty bad effect on CPU load:

1. Set the "Connection security" settings of a Gmail account to "STARTTLS" (rather than the correct value "SSL/TLS").
2. Try getting emails from this account.
3. Until the connection times out, the status bar moves forth and back and CPU load is some 30% (of one core).

4. Add to <profile>/chrome/userChrome.css (similarly to what Wayne suggested above):
   #statusbar-icon {
      display: none;
   }
   or do this modification via the Developer Toolbox.
5. Try getting emails from this account again.
6. No moving status bar appears and CPU load remains low.
Again, why is this bug still marked UNCONFIRMED?!?
Severity: major → critical

The profiler is working again. Please run the performance profiler:

  1. Use Thunderbird 68 or newer - release or beta
  2. Install profiler add-on into thunderbird - get the add-on file from https://github.com/firefox-devtools/Gecko-Profiler-Addon/blob/master/gecko_profiler.xpi?raw=true and in Tools > add-ons click the gear to install add-on from file
  3. Follow instructions at https://profiler.firefox.com/ Also see videos based on Firefox, but applicable to Thunderbird.
  4. Create a profiler URL and post it here, along with a description of what/how you tested.
Flags: needinfo?(scovich)
Flags: needinfo?(mueller8)

Still needs profile, but let's start fresh when version 78 comes out

Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(vseerror)
Flags: needinfo?(scovich)
Flags: needinfo?(nl0)
See Also: → 562977, 1107251
Flags: needinfo?(vseerror)
Flags: needinfo?(scovich)
Flags: needinfo?(nl0)

no one using version 78 where the profiler works?

Flags: needinfo?(scovich)
Flags: needinfo?(nl0)

Thomas, can you devise a solution by correlating comment 55 and David's work in comment 62?

Flags: needinfo?(bugzilla2007)
Whiteboard: [needs profile][needs protocol log] → [needs profile][needs protocol log][workaround: comment 55]

Wayne, pleased to see that according to the last comment here (of two months ago) you went after this issue recently.

(In reply to David von Oheimb from comment #69)

Wayne, pleased to see that according to the last comment here (of two months ago) you went after this issue recently.

yes, I am aware

See Also: → 1679324

(In reply to David von Oheimb from comment #62)

Here is one more experiment clearly indicating that in fact the DOM element
with Id #statusbar-icon of class .progressmeter-statusbar [...]
and not an element of class .tab-throbber has a pretty bad effect on CPU
load:

It's worth noting that XUL <progressmeter> has since been replaced with HTML <progress> in bug 1499593, which may have a positive effect here.
When I once tested this a while back, iirc the CPU load was more around 5%, and hiding status bar <progress> did not seem to significantly change that, maybe 2% down (well, that would be a 50% decrease in absolute terms...).

  1. No moving status bar appears and CPU load remains low.

Maybe moving as in "oscillating green" is the key here. That status bar <progress> thing gets updated very frequently in the weirdest of ways, where percentage-based progress of some actions is combined with other general progress actions, and things which don't have measurable progress cause the back and forth motion, and the overall progress of all those different actions is supposed to end up in a single <progress> element. Not really surprising if that goes haywire and drives up CPU.

See Also: → 1499593
Summary: High CPU utilization while sending or receiving over slow network. (WFM in version 17? Bad again in TB24?) → High CPU utilization while sending or receiving over slow network. (WFM in version 17? Bad again in TB24?) - may involve oscillating <progress> element in status bar

The issue persists. Here is a new profile confirming it: https://share.firefox.dev/3dILA3K
This profile has been recorded with TB 91.4.0 where 1.2.3.4 has been set up as IMAP "server name"
and pressing "Get Messages" about 1 second after starting the recording.

As mentioned already years ago at several places on Bugzilla, there is a simple way of reproducing the issue,
not only for IMAP but for various types of connections: let TB or FF connect to a "non-existing" IP address, such as 1.2.3.4.

BTW, the issue is of course not only with IP address 1.2.3.4.
It occurs also with real servers that currently are not reachable.
In my case it is the mail server of my company, while my machine is not connected to the company's intranet;
which results in this case in a DNS resolution failure for the mail server host name.
Here is the performance profile: https://share.firefox.dev/3oJHezo

Argh, it turns out that most of the performance profiles I shared recently were not actually recorded in safe mode,
and the reason is that if I start Thunderbird with "--ProfileManager --safe-mode" via my usual shell script, the "--safe-mode" option gets ignored :-(
Yet another TB bug 1745570.

Anyway, the observed misbehavior is clearly independent of safe mode.
For the case you don't believe my judgment, here is a new profile witnessing this: https://share.firefox.dev/3GAPp7u

Here is one more weird thing (again independent of safe mode):
When I disable automatic fetching mails from the unreachable IMAP server, the TB idle CPU load reduces to "just" some 12% (which is still too much).
As soon as I start recording a performance profile, the CPU load rises again to some 70% while TB should be idle.
This seems to defeat the very purpose of performance profiling.
Anyway, here is the profile obtained this way: https://share.firefox.dev/3oHOyM8

BTW, it turns out that Evolution has the same bug. https://gitlab.gnome.org/GNOME/evolution/-/issues/1741

Come on guys.
This bug report is meanwhile 10+ years old and (correctly) marked a critical defect.
But still I don't see even an attempt to actually fix it.

Things have worsened. I currently get 110% CPU usage while TB should be just idle.
Here is a fresh 'performance' profile for this, again on Linux with TB 91.4.0 : https://share.firefox.dev/3e8K6Qj

Depends on: 1752641
See Also: → 1753195

I can confirm the problem still exists in TB 91.5.1 (64-bit, Mac). Same symptom -- an endless progress bar at the bottom of the window (sitting at apparently 100%), continuously consuming 20-40% of a core. The network is up, I'm using a remote connection continuously while this happens.

Status: NEW → RESOLVED
Closed: 12 years ago3 years ago
Resolution: --- → INVALID
Status: RESOLVED → REOPENED
Resolution: INVALID → ---

(sorry for the close/reopen -- bugzilla closed the bug automatically when I saved my previous comment)

See Also: → 1754158
See Also: 1754158

Shame that this bug, which meanwhile is 12+ years old, is still not properly tackled and fixed.

Bugs are not fixed by neglecting them or managing/discussing them forth and back,
but by understanding what the actual issue is (in this case, pretty sure a busy waiting loop)
and some developer getting his/her hands dirty and doing something about it.

As I mentioned many times, both within this bug report and several related ones, the issue is easily reproducible
by configuring the mail server name to an unreachable IP address such as 1.2.3.4 and then trying to connect.

My CPU usage is only 30-40%, but I see the same function (PollWrapper) being called again and again.

We overwrite the poll function with our own PollWrapper via g_main_context_set_poll_func in widget/gtk/nsAppShell.cpp#317.

More observations:

The call stack getting repeated:

#0  PollWrapper(_GPollFD*, unsigned int, int) (aUfds=0x7f3ae50145e0, aNfsd=5, aTimeout=0) at /home/user/dev/gecko-thunderbird/widget/gtk/nsAppShell.cpp:60
#1  0x00007f3b07e74a9f in  () at /usr/lib/libglib-2.0.so.0
#2  0x00007f3b07e15032 in g_main_context_iteration () at /usr/lib/libglib-2.0.so.0
#3  0x00007f3aff62f3af in nsAppShell::ProcessNextNativeEvent(bool) (this=<optimized out>, mayWait=<optimized out>) at /home/user/dev/gecko-thunderbird/widget/gtk/nsAppShell.cpp:422
#4  0x00007f3aff5a68a7 in nsBaseAppShell::DoProcessNextNativeEvent(bool) (this=this@entry=0x7f3af4fee880, mayWait=false) at /home/user/dev/gecko-thunderbird/widget/nsBaseAppShell.cpp:131
#5  0x00007f3aff5a6b10 in nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool) (this=0x7f3af4fee880, thr=0x7f3b0af3c3c0, mayWait=<optimized out>) at /home/user/dev/gecko-thunderbird/widget/nsBaseAppShell.cpp:250
#6  0x00007f3aff5a6ca1 in non-virtual thunk to nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool) () at /home/user/dev/gecko-thunderbird/widget/nsBaseAppShell.cpp:287
#7  0x00007f3afbdd8907 in nsThread::ProcessNextEvent(bool, bool*) (this=0x7f3b0af3c3c0, aMayWait=false, aResult=0x7fffcdf17347) at /home/user/dev/gecko-thunderbird/xpcom/threads/nsThread.cpp:1154
#8  0x00007f3afbddc894 in NS_ProcessNextEvent(nsIThread*, bool) (aThread=0x7f3ae50145e0, aThread@entry=0x7f3b0af3c3c0, aMayWait=false) at /home/user/dev/gecko-thunderbird/xpcom/threads/nsThreadUtils.cpp:479
#9  0x00007f3afc6570df in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (this=0x7f3af1e29340, aDelegate=0x7f3b0af1bd40) at /home/user/dev/gecko-thunderbird/ipc/glue/MessagePump.cpp:85
#10 0x00007f3afc5cce35 in MessageLoop::RunHandler() (this=0x7f3b0af1bd40) at /home/user/dev/gecko-thunderbird/ipc/chromium/src/base/message_loop.cc:361
#11 MessageLoop::Run() (this=0x7f3b0af1bd40) at /home/user/dev/gecko-thunderbird/ipc/chromium/src/base/message_loop.cc:343
#12 0x00007f3aff5a68f3 in nsBaseAppShell::Run() (this=0x7f3af4fee880) at /home/user/dev/gecko-thunderbird/widget/nsBaseAppShell.cpp:148
#13 0x00007f3b00dd2c5a in nsAppStartup::Run() (this=0x7f3af1e63ba0) at /home/user/dev/gecko-thunderbird/toolkit/components/startup/nsAppStartup.cpp:295
#14 0x00007f3b00ee1233 in XREMain::XRE_mainRun() (this=this@entry=0x7fffcdf17668) at /home/user/dev/gecko-thunderbird/toolkit/xre/nsAppRunner.cpp:5659
#15 0x00007f3b00ee1da2 in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) (this=this@entry=0x7fffcdf17668, argc=argc@entry=4, argv=argv@entry=0x7fffcdf18968, aConfig=...)
    at /home/user/dev/gecko-thunderbird/toolkit/xre/nsAppRunner.cpp:5859
#16 0x00007f3b00ee2232 in XRE_main(int, char**, mozilla::BootstrapConfig const&) (argc=4, argv=0x7fffcdf18968, aConfig=...) at /home/user/dev/gecko-thunderbird/toolkit/xre/nsAppRunner.cpp:5915
#17 0x000056024249eb41 in do_main(int, char**, char**) (argc=4, argv=0x7fffcdf18968, envp=<optimized out>) at /home/user/dev/gecko-thunderbird/comm/mail/app/nsMailApp.cpp:229
#18 main(int, char**, char**) (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/user/dev/gecko-thunderbird/comm/mail/app/nsMailApp.cpp:386

The hot loop is at #5 https://searchfox.org/mozilla-central/rev/27e4816536c891d85d63695025f2549fd7976392/widget/nsBaseAppShell.cpp#248-251

do {
  mLastNativeEventTime = now;
  keepGoing = DoProcessNextNativeEvent(false);
} while (keepGoing && ((now = PR_IntervalNow()) - start) < limit);

Pernosco: https://pernos.co/debug/_ddLRlZf7oD5hXhB0Wlfjg/index.html

It was worked on 10 years ago: Bug 930793, but backed out in the end for causing performance regressions: https://searchfox.org/mozilla-central/diff/c3dca4be1e54b90f31c44755a984a9c4a9458a23/widget/nsBaseAppShell.cpp#278. And worked on again in Bug 1260070.

Olli, since you've worked twice on that code, do you have any insights on how best to resolve this? We probably need to remove the busy loop without causing perf regressions.

Flags: needinfo?(smaug)

What is Necko (assuming this is a necko issue) doing to keep the main thread of the parent process so busy? Or rather, not the main thread of Gecko, but the OS level event queue/loop.
Does it trigger something on the OS side which then triggers appshell to run all the time?

https://bugzilla.mozilla.org/show_bug.cgi?id=1804295 is where the performance mode was removed.

Flags: needinfo?(smaug) → needinfo?(manuel)

But do you have a performance profile for this? Appshell is supposed to be high up there, if there are lots of tasks.
Thunderbird does have that one issue where it keeps re-styling something all the time. Switching tab to a calendar tab and back fixes that.

This is a performance profile on the latest commit: https://share.firefox.dev/3MQJiR6. It does have Appshell in the top.

@kershaw Can you answer Olli's question regarding Necko?
(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #87)

What is Necko (assuming this is a necko issue) doing to keep the main thread of the parent process so busy? Or rather, not the main thread of Gecko, but the OS level event queue/loop.
Does it trigger something on the OS side which then triggers appshell to run all the time?

Flags: needinfo?(manuel) → needinfo?(kershaw)

The main thread is mostly idle in that profile. The poll just tells that the thread is waiting for more work, but if you zoom in to the main thread you can see that there is all the time refreshdriver ticking and then some idle time between. So this might be a Thunderbird frontend issue. Why is it triggering a refreshdriver tick all the time?.
This is the type of issue I mention in my comment and what I see every now and then.

Will try to capture a profile where this happens.

Flags: needinfo?(acreskey)

(In reply to Manuel Bucher [:manuel] from comment #89)

This is a performance profile on the latest commit: https://share.firefox.dev/3MQJiR6. It does have Appshell in the top.

@kershaw Can you answer Olli's question regarding Necko?
(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #87)

What is Necko (assuming this is a necko issue) doing to keep the main thread of the parent process so busy? Or rather, not the main thread of Gecko, but the OS level event queue/loop.
Does it trigger something on the OS side which then triggers appshell to run all the time?

From this profile, I don't see any sign of networking problem.
We should also include socket thread in the profiler if we suspect this is a networking issue.

Flags: needinfo?(kershaw)

(In reply to Andrew Creskey [:acreskey] from comment #91)

Will try to capture a profile where this happens.

reminder ...

(In reply to Kershaw Chang [:kershaw] from comment #92)

...
We should also include socket thread in the profiler if we suspect this is a networking issue.

Flags: needinfo?(bugzilla2007)
Blocks: 1830641
Flags: needinfo?(vseerror)

Need a profile with Socket Thread to action on it. I will create one next week.

Flags: needinfo?(manuel)

Created a profile with socket thread. Kershaw, can you take a look? https://share.firefox.dev/3Tc3VvA
Maybe cleaner profile (containing only the error state): https://share.firefox.dev/3uVtsQ7
Using the Thunderbird preset: https://share.firefox.dev/3wxnZ2u

Flags: needinfo?(manuel) → needinfo?(kershaw)

The profiles seem to suggest that most of CPU resource is used in DOM or JS code, rather than network activities.
This makes me believe that comment #90 might be still valid - this looks like a Thunderbird front end issue.

Andrew, if you have time, please also look profiles in comment #95 and see if my conclusion is correct. Thanks.

Flags: needinfo?(kershaw)

(In reply to Kershaw Chang [:kershaw] from comment #96)

The profiles seem to suggest that most of CPU resource is used in DOM or JS code, rather than network activities.
This makes me believe that comment #90 might be still valid - this looks like a Thunderbird front end issue.

Andrew, if you have time, please also look profiles in comment #95 and see if my conclusion is correct. Thanks.

Absolutely agree with that conclusion.
Socket thread is just waiting while the Thunderbird front end is perptually hard at work on JS.

Flags: needinfo?(acreskey)
Flags: needinfo?(vseerror)
See Also: → 1875103
Blocks: 1875103
See Also: 1875103

Another profile recorded on a new profile with no messages: https://share.firefox.dev/3WJDpu1 For some reason much time is spend on message indexing.

You need to log in before you can comment on or make changes to this bug.