[go: nahoru, domu]

Open Bug 982678 Opened 11 years ago Updated 2 years ago

NFS4 Home and thunderbird-bin futex hang

Categories

(MailNews Core :: Networking, defect)

x86_64
Linux
defect

Tracking

(Not tracked)

UNCONFIRMED

People

(Reporter: nickjon, Unassigned, NeedInfo)

References

Details

(Keywords: hang, stackwanted)

Attachments

(1 file)

71.45 KB, text/plain
Details
User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36

Steps to reproduce:

Use NFS4 mounted tbird profile / home directory.  Thunderbird works fine, but eventually I'll run into this problem.  Happens 3-4 times a week.  Usually exhibits as a message header which shows but the message itself fails to download, never recovers.  Close thunderbird gui but the thunderbird-bin keeps waiting on futex.  Version 24.3.0-61.39.2-x86_64 from Opensuse.  Sometimes it happens more frequently, is not 100% reproducible.


Actual results:

Usually exhibits as a new message header appearing in bold, but message fails to download.  Close thunderbird gui but the thunderbird-bin keeps waiting on futex.  I've tried waiting very long periods and it never recovers. 
Strace is from point of message download hanging (message downloading doesn't appear), to closing thunderbird, to killing thunderbird-bin.


Expected results:

Message loads, thunderbird-bin doesn't hang waiting.
Some more info, very generic NFS mount options.
OpenSuse 12.3 client / fstab:
host.name:/sharename /mntpoint nfs4 defaults 0 0 

OpenSolaris ZFS / NFS exports:
sec=sys,rw=IP,root=IP

Will get a gdb stacktrace with proper debuginfo, if I can reproduce it.  Is the strace output no good?  I don't see that it was included, would that be helpful?
I also have this problem.  Mint Linux 13 MATE.  

  Application Basics

    Name: Thunderbird
    Version: 24.3.0
    User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0
    Profile Directory: Open Directory

              (Network drive)
    Application Build ID: 20140210221328
    Enabled Plugins: about:plugins
    Build Configuration: about:buildconfig
    Crash Reports: about:crashes
    Memory Use: about:memory

  Extensions
    CompactHeader, 2.0.8, true, {58D4392A-842E-11DE-B51A-C7B855D89593}
    EDS Contact Integration, 0.6, true, edsintegration@mozilla.com
    Lightning, 2.6.4, true, {e2fda1a4-762b-4020-b5ad-a41df1933103}
    Messaging Menu and Unity Launcher integration, 1.3.1, true, messagingmenu@mozilla.com

I first noticed this when I changed from using gvfs for mounting my Windows shares to autofs, but I can't be sure that it didn't exist before.  In any case, Thunderbird now takes 100% of one CPU core while running (which it seems to do correctly).  When closed, it disappears from the screen in the usual way, but continues running, still using 100% of one CPU core, apparently with a futex_wait_queue_me blocking it from shutting down completely.  I have to end the process with System Monitor.  I have not yet observed any crashes, but I just changed the way in which I mount the Windows shares today, so I don't have much experience to go by.

I note that my profile/home directory is on one of the Windows shares so that I can run Thunderbird either from Linux Mint MATE or from WinXP.  I have not yet had problems running this configuration on WinXP.  But, I'd like to migrate to Linux for obvious reasons.
Attached file gdb.txt
Hi John,

Yours sounds like a different problem, I have no CPU usage when the problem occurs.

I'm wondering if the issue has not been resolved in an update, or is related to some feature of our network that has changed.

I'll continue watching for it a while longer, but may close this bug if it continues to work.

Nick
Hi Nick,

I'm not sure, but that's why I went through the trouble of adding my comments.  We both have the futex wait issue.  It could be that this causes different code to run on our different systems, depending upon the software that we use to connect our shares.  One of the things that I just now noticed: we're both running zfs based shares.  Mine is from Nexenta, essentially identical to Open Solaris.  I'm not sure what the significance of this is, but it's another data point.

BTW, do I understand that your problem has disappeared?

John
This is an update.  I've changed the way in which I mount the Windows shares from autofs to gvfs using Gigolo.  I guess this is the preferred way, but, because they weren't visible to everything, I didn't like it.  I solved this problem by making symlinks.

In any case, now Thunderbird works as it should.  No more problems with closing the app.  Also no more problems with high CPU usage (and lots of network traffic to the server too!).

So, to summarize, the interaction between Thunderbird and autofs mounted shares when Thunderbird's files are on the one of the shares is problematic.  I don't know if this is worth fixing since there is a reasonable work-around using gvfs.
Hi John,

I only mentioned that it might be a different problem because in previous similar bug reports, the cpu usage seemed important in differentiating bugs of this kind.

I have not seen the problem, but that doesn't mean it's disappeared.  I have yet to determine a way to reproduce it, so it's questionable whether it's a bug at all.  So I'm still in a holding pattern.  Hasn't recurred again so I'm feeling more confident it was related to another issue that's been fixed in an update.

Thanks for your input.
Have you seen this more recently?
Flags: needinfo?(nickjon)
Flags: needinfo?(jhbowers)
(In reply to Wayne Mery (:wsmwk) from comment #10)
> Have you seen this more recently?

No, but I haven't tried to reproduce it.  The approach that I outlined in comment 8 above works and I haven't had a need to change it.  So, I have no new info for you.
Flags: needinfo?(jhbowers)
I still see this problem using NFSv4 mounted home directories with default settings (or any settings for that matter).  I get around it by symlinking the .Thunderbird profile to a local directory, because I only check mail from one computer.  I believe the root problem is related to file locking under Linux.

This blog has some interesting details.
0pointer.de/blog/projects/locking.html

I think it would be really nice if Thunderbird could not rely on file locking in $HOME on Linux, or if it could be made to work with NFS, but I think both of those solutions would be a lot of work if they are possible at all.  For now, I will try to play around with GVFS over SFTP/SSH or continue using a local profile.

Thanks for looking into it.
Flags: needinfo?(nickjon)
I accidentally fell into this problem again today.  It occurred because I was trying out mounting the network share where the Thunderbird profile.default directory exists via the fstab method instead of using gigolo.  I wanted to do this because I noted that the data transfer rate via gigolo mount seems slow.  

I did some additional testing with the same (cifs) network share mounted by the fstab method and simultaneously by mounted by gigolo at different mount points.  When I set the path in profiles.ini to the fstab mount point, the problem manifested itself again, including the failure of Thunderbird to completely shut down when I close the window, still taking nearly 100% of 1 core.  After killing the Thunderbird process and setting the path in profiles.ini to the gigolo mount point, everything is fine.  This is completely reversible in both directions, and no reboots are required between changes to demonstrate the issue.

I'd love to get rid of this issue without having mount a network share twice, but for now it seems the only workable solution for me.
(In reply to John from comment #11)
> (In reply to Wayne Mery (:wsmwk) from comment #10)
> > Have you seen this more recently?
> 
> No, but I haven't tried to reproduce it.  The approach that I outlined in
> comment 8 above works and I haven't had a need to change it.  So, I have no
> new info for you.

could you try as a test?
Component: Untriaged → Networking
Flags: needinfo?(nickjon)
Product: Thunderbird → MailNews Core
Version: 24 Branch → 24
Still seeing the problem with NFS (problem is NFS related).  GVFS with CIFS is not suitable as it creates a performance issue, and does not align with the permission mode bits we use for access control.  Does the documentation state that NFS is not supported / recommended for profile storage because NFS does not reliably support file locking (required by Thunderbird)?
(In reply to Wayne from comment #14):

Yes.  

BTW, an upgrade from Lubuntu 14.04 to 16.04 nixed my ability to use GVFS via gigolo.  See trashcan4junk's comments above.  So, the problem now exists all of the time.  It's a rather large PITA when I'm using a single core processor (I know, I know!).  Tried increasing the mail.db.idle.limit by a few orders of magnitude, but this simply (sometimes) delays the problem.  The workaround, for now, is to run Thunderbird when I want to see if I have any e-mail messages and then exit.  If I leave it run, it goes into CPU hog mode and exits (eventually) with errors (and apparently a messed up index file since it needs to re-index upon the next invocation).

FYI, all files, including the profile, still are on a CIFS server mounted through fstab.

In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.

Severity: critical → --
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: