[go: nahoru, domu]

Open Bug 74424 Opened 24 years ago Updated 2 years ago

Save as html file always saves the header in utf-8

Categories

(MailNews Core :: Internationalization, defect)

defect

Tracking

(Not tracked)

UNCONFIRMED

People

(Reporter: ji, Assigned: smontagu)

References

(Depends on 1 open bug)

Details

(Keywords: intl, regression, testcase-wanted, Whiteboard: [See comment 28] [needs owner])

Attachments

(2 files, 3 obsolete files)

***Observed with win32 04/02 trunk build***

With 6.0/6.01, save as html file saves the header and body in the charset that 
charset label indicates. But now with the current build, the header is always 
saved in utf-8 regardless of the charset indicated in MIME.

Steps to reproduce:
1. Go to smoketest folder.
2. Select the 2nd, which is an iso-2022-jp mail.
3. Select File | Save as | File, save it as a html file.
   Open the file with a navigator, you'll see the mail body is saved in 
iso-2022-jp while header is saved in utf-8. (Switch the charset menu to utf-8, 
you can view the header, but you can't view the body. If you switch the charset 
to iso-2022-JP, you can view the body, but you can't view the header.)
4. Send yourself a shift-jis mail and save it as a html file, the header is also 
saved in utf-8, while the body is saved in shift-jis.
Keywords: intl, regression
i've noticed this problem today when i was sending a cyrillic page out. The
default charset for viewing in the browser is set to UTF-8 in today's build
That might be a different problem. This is not related to the view default 
charset of the browser, although you can use the browser to check the charset 
that the mail is saved in.
http://lxr.mozilla.org/seamonkey/source/mailnews/mime/src/mimehdrs.cpp#48

 48 static char *
 49 MimeHeaders_convert_header_value(MimeDisplayOptions *opt, char **value)
 50 {
 51   char        *converted;
 52 
 53   if (!*value)
 54     return *value;
 55 
 56   if (opt && opt->rfc1522_conversion_p)
 57   {
 58     converted = MIME_DecodeMimeHeader(*value, opt->default_charset, 
 59                                       opt->override_charset, PR_TRUE);

Reassign to jgmyers. In case of saving, it should not convert to UTF-8.
Assignee: nhotta → jgmyers
My solution would be to check in a fix to bug 68982.

Reassign to Ben Bucksch, since he wants to keep the feature of saving to html.
Assignee: jgmyers → ben.bucksch
John, if people file bugs about this code, I guess they care that it works, not?
Removing the code altogether because of i18n bugs surely is no solution.

That's all I'm saying. I didn't volunteer to maintain that code. It's marked a
regression, so whoever broke it has to fix it.

Back to default owner for triaging.
Assignee: ben.bucksch → nhotta
I've been thinking about why we would need .html format
for saving. The .txt format is easy to guess -- we want 
a plain text content only kind of data saved. The .eml
format is simply RFC 822 data themselves and clearly needed.

But what about the .html format? If we want to share some
msgs on the net with others with confidence that they
will display on any browsers and look like mail msgs, 
I think this is the format that will be most useful. 

BTW, the .eml format file (from Mozilla) displays with
Outlook Express rather than with IE if you try to open
such a file with IE5.x. Both Comm 4.x and Mozilla can
display such a file in a browser window.
Mark as future for now, revisit after the decision of bug 68982.
Depends on: 68982
Target Milestone: --- → Future
Status: NEW → ASSIGNED
With 07/24 branch build, mail body is also saved in utf-8.
Modified the summary accordingly.
Summary: Save as html file always saves the header as a utf-8 string → Save as html file always saves the header and mail body in utf-8
my mistake, only the header is saved in utf-8. 
Summary: Save as html file always saves the header and mail body in utf-8 → Save as html file always saves the header in utf-8
> 56   if (opt && opt->rfc1522_conversion_p)
> 57   {
> 58     converted = MIME_DecodeMimeHeader(*value, opt->default_charset, 
> 59                                       opt->override_charset, PR_TRUE);
>
> In case of saving, it should not convert to UTF-8.

But it still need to decode, isn't it?
Attached patch patch, v1 (obsolete) — Splinter Review
This patch converts mime headers from utf8 to message's charset, if we're
saving as html. (note that we still mime-decode headers!).
Perhaps, there is better place to apply charset conversion, but I failed
to find it :-)
(Tested with koi8-r To: and Subject: headers)
Looks good, if 'charset' is only used for the SaveAs case then it does not have
to be allocated for other cases.
Target Milestone: Future → ---
Attached patch patch, v2 (obsolete) — Splinter Review
Allocate charset string only when saving as.
Also replaced
      for (colon = head; colon < end; colon++)
	if (*colon == ':') break;
       
	if (colon >= end) continue;   /* junk */
with
      for (colon = head; colon < end && *colon != ':'; colon++)
	;


and removed few TABs
1) Please initialize this.
+  char *charset;

2) Is this change also related to the SaveAs header problem? If you remove them
there, you can remove the function itself.

-      status = mimeEmitterAddAttachmentField(opt, name, 
-                                MimeHeaders_convert_header_value(opt, &c2));
+      status = mimeEmitterAddAttachmentField(opt, name, c2);
     else
-      status = mimeEmitterAddHeaderField(opt, name, 
-                                MimeHeaders_convert_header_value(opt, &c2));
+      status = mimeEmitterAddHeaderField(opt, name, c2);

Attached patch patch, v3 (obsolete) — Splinter Review
1) Done
2) Sorry, I don't quite understand what you mean here...
Are talking about MimeHeaders_convert_header_value() function? 
But I still use few lines above...
Okay, you moved that up, I thought you removed them so I asked that question.
Comment on attachment 65881 [details] [diff] [review]
patch, v3

r=nhotta

please get sr
Attachment #65881 - Flags: review+
Comment on attachment 65881 [details] [diff] [review]
patch, v3

>+  char *charset = NULL;
you want nsnull, right?

while you are in this code, can you rename "c2" to something better?

sr=sspitzer
Attachment #65881 - Flags: superreview+
NULL -> nsnull
c2 -> hdr_value

Also moved declaration of c2 (hdr_value) inside of for loop
(hdr_value used only inside that loop)
checked in

Thank you for the contribution.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla0.9.9
Reopened the bug, I still see the problem with 06/17 branch build.
QA contact to marina. Thanks.
Status: RESOLVED → REOPENED
QA Contact: ji → marina
Resolution: FIXED → ---
still reproducable in 07-01-1.0.branch build on Windows and Linux.
*** Bug 122517 has been marked as a duplicate of this bug. ***
Status: REOPENED → ASSIGNED
Target Milestone: mozilla0.9.9 → ---
What if headers have characters not covered by charset of the body? 

Subject: =?iso-8859-2?q?.......?=
From: =?iso-2022-jp?b?......?= <....@....jp>
To: =?euc-kr?b?.......?= <.....@.....kr>
Content-Type: text/plain; charset=iso-8859-2

Saving everything(header and body) in UTF-8 would always work for rare cases
like the above. 

Anyway, at least we need to be consistent for body and header. 
 

Assignee: nhottanscp → smontagu
Status: ASSIGNED → NEW
OS: Windows 98 → All
QA Contact: marina
Hardware: PC → All
Product: MailNews → Core
Product: Core → MailNews Core
patch author's (Denis Antrushin') address no longer valid.
set patchlove
reset QA
QA Contact: i18n
Whiteboard: [patchlove][has patch][needs owner]
(In reply to comment #19)
> Created an attachment (id=66753) [details]
> patch, v3.1
> 
> NULL -> nsnull
> c2 -> hdr_value
> 
> Also moved declaration of c2 (hdr_value) inside of for loop
> (hdr_value used only inside that loop)

(In reply to comment #20)
> checked in
> 
> Thank you for the contribution.

This patch seemed to be checked in in the past, but it slipped my mind on how to check for past checkins. Maybe Serge would know, cc'ing.
Attachment #64516 - Attachment is obsolete: true
Attachment #64808 - Attachment is obsolete: true
Attachment #65881 - Attachment is obsolete: true
Comment on attachment 66753 [details] [diff] [review]
patch, v3.1
[Checkin: Comment 20]


(In reply to comment #26)
> it slipped my mind on how to check for past checkins.

Ftr,
http://bonsai.mozilla.org/cvslog.cgi?file=mozilla/mailnews/mime/src/mimehdrs.cpp&rev=HEAD&mark=1.56#1.59
Attachment #66753 - Attachment description: patch, v3.1 → patch, v3.1 [Checkin: Comment 20]
(In reply to comment #5)
> It's marked a regression, so whoever broke it has to fix it.

Sadly, it seems too late now to look for a regression timeframe (is it not?) :-/


(In reply to comment #21)
> Reopened the bug, I still see the problem with 06/17 branch build.

(In reply to comment #22)
> still reproducable in 07-01-1.0.branch build on Windows and Linux.

Someone has to (re)confirm whether this bug still exists or if it was kind of "1.0 branch only".
Testcase(s) would be ideal.


(In reply to comment #24)
> What if headers have characters not covered by charset of the body? 
> Saving everything(header and body) in UTF-8 would always work for rare cases
> like the above. 

I think we have a feature which asks "loose some data or use utf-8" when composing emails.
We should probably reuse it, if this behavior applies.

> Anyway, at least we need to be consistent for body and header. 

Agreed!
Flags: wanted-thunderbird3?
Keywords: qawanted
Whiteboard: [patchlove][has patch][needs owner] → [See comment 28] [needs owner]
Flags: wanted-thunderbird3?

Dossy, can you formulate a testcase and reproduce?

Status: NEW → UNCONFIRMED
Ever confirmed: false
Flags: needinfo?(dossy)

I think the bug here is if "File > Save As > File > Format: HTML Files" ever generates anything other than UTF-8.

Thanks to Jungshik Shin's comment #24, which illustrates that RFC 2047 allows for mixing various character encodings in the mail header, and MIME parts can also be in different encodings, but a single HTML document must be encoded in one single encoding.

Therefore, the only way to guarantee the HTML output properly reproduces the email that is being saved is to always transcode the entire message and all its various parts in their different encodings to a common encoding such as UTF-8, otherwise the saved output will always be lossy or will not render properly as an HTML document, which is not the correct behavior.

I'm attaching a shell script that works for me on OSX 10.13 to send myself an email with multiple encodings, which you can then save the email as HTML and then open that resulting HTML file in a browser, and see how the ISO-2022-JP and KOI8-R sections (which contain the literal bytes from the corresponding encodings) fail to render nicely in the browser.

Flags: needinfo?(dossy)
Attached file bug-74424-email.sh
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: