[go: nahoru, domu]

Open Bug 56908 Opened 24 years ago Updated 2 years ago

Non-ascii file name is displayed incorrectly in the browser after being saved in .eml format

Categories

(MailNews Core :: Internationalization, defect)

defect

Tracking

(Not tracked)

People

(Reporter: marina, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: intl, Whiteboard: [patchlove][needs updated patch?])

Attachments

(2 files, 1 obsolete file)

Steps to reproduce:
- invoke a new mail composition;
- attach a file with non-ascii name;
- send and get message;
- now select message and save it in eml format;
- open the saved file in Browser by going  File|Open (select file)
//now note: non-ascii name for the attached file in the body displays single 
non-ascii chars as two single chars
Attached image attached a screen shot
Reassign to rhp.
Assignee: nhotta → rhp
Sorry, this isn't happening for rtm.

- rhp
Status: NEW → ASSIGNED
Target Milestone: --- → Future
I don't think that that is a good way to solve this problem. META charset forces
the HTML engine to restart with the new charset. It would be better to start
with the right charset in the first place. In the HTTP world, this is done by
an HTTP Content-Type header. In this case, the mail engine is generating some
HTML, but HTTP might not be involved, so we probably have to pass something to
the HTML engine to make it believe that the HTTP charset has been set. I don't
know the details, but I believe the architecture would be better that way.
I understand you say.  But if auto-detect engine of Mozilla is very smart, this 
issue doesn't occur.  Is there the best way whether encoding is UTF-8?? 
No no no. Auto-detect is even worse than META charset, architecturally. Take a
look at the APIs for the HTML engine, and see if there is some way to make it
believe that the HTTP charset has been set. We are generating the HTML, so we
don't want to rely on any auto-detect heuristics when consuming that HTML.
If it helps any, the html engine should be getting the content via the channel.
nsIChannel::GetContentType. Our mime engine sets the content type on the channel
it presents to the html parser. The parser should be using this information
(this should be the same way the parser gets the content type from the http channel)
Keywords: intl
Mass change to bugs filed by marina --> QA contact to marina.
thanks!
QA Contact: momoi → marina
reassigning to ducarroz
Assignee: rhp → ducarroz
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
I understand erik's concern, but I have a different opinion. It's not that much
expensive to reset the charset and it's happening everyday on the web. There are
a lot of people who believe that http header should have been given a lower (not
higher) priority than 'meta charset'.  Besides, in cases eml files are moved
around, including 'charset' information in it is a good thing (TM). 

mscott, can we make an assumption that eml files have been always in UTF-8? What
if somebody transcodes them outside Mozilla? Well, that's her responsibility.
So, my question would be if Mozilla always used UTF-8 for eml files. If yes, we
may do something in nsIChannel. 
OS: Windows NT → All
Hardware: PC → All
All my eml messages with umlauts (char encoding ISO-8859-1 etc) are wrongly
displayed in the browser window. If I manually switch the encoding to UTF-8 the
display is correct. See for example bug 206421.
related is Bug 263850
Product: MailNews → Core
Change 'eml' in summary to '.eml' for ease of search ('extremly' will hit).
Summary: Non-ascii file name is displayed incorrectly in the browser after being saved in eml format → Non-ascii file name is displayed incorrectly in the browser after being saved in .eml format
Blocks: 269826
No longer depends on: 269826
The patch here can be still applied. If it's still a problem, we should check
that in. Given that '.eml' file can be moved around and viewed by a program
other than Mozilla, it should have 'in-band' information about the character
encoding. 
Product: Core → MailNews Core
QA Contact: marina → i18n
Assignee: bugzilla → nobody
Status: ASSIGNED → NEW
Makoto Kato, is your patch still needed and good?
Flags: needinfo?(m_kato)
Priority: P3 → --
Whiteboard: [patchlove][needs updated patch?]
Target Milestone: Future → ---
(In reply to Wayne Mery (:wsmwk) from comment #16)
> Makoto Kato, is your patch still needed and good?

This depends on HTML rendering engine implementation.  Gecko detects as UTF-8 even if no charset, but IE cannot detect as UTF-8.
Flags: needinfo?(m_kato)
tested on Firefox 29, Chrome 34 and IE11.  Chrome 34 and IE11 cannot detect exported HTML as UTF-8.  So character corruption is caused on these browsers.
Attached patch rebased patchSplinter Review
Attachment #19335 - Attachment is obsolete: true
Attachment #8356940 - Flags: review?(Pidgeot18)
Comment on attachment 8356940 [details] [diff] [review]
rebased patch

Review of attachment 8356940 [details] [diff] [review]:
-----------------------------------------------------------------

First off, this needs a test.
Second off, this patch is wrong. I created a message with an ISO-2022-JP body text (and a non-ASCII filename), and found that the resulting HTML saved the part as ISO-2022-JP instead of UTF-8, so declaring a charset would be liable to break messages that currently work. (That's how bad our charset logic is).

Finally, I'd prefer <meta charset=""> over the http-equiv form.
Attachment #8356940 - Flags: review?(Pidgeot18) → review-
(In reply to Joshua Cranmer [:jcranmer] from comment #20)
> Comment on attachment 8356940 [details] [diff] [review]
> rebased patch
> 
> Review of attachment 8356940 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> First off, this needs a test.
> Second off, this patch is wrong. I created a message with an ISO-2022-JP
> body text (and a non-ASCII filename), and found that the resulting HTML
> saved the part as ISO-2022-JP instead of UTF-8,

When I test this, HTML always is encoded as UTF-8, not ISO-2022-JP.  How do you save to ISO-2022-JP?

Step
====
1. Compose message as ISO-2022-JP
2. send and receive this.  Message is the following.

--------------010309080004020102040601
Content-Type: text/plain; charset=ISO-2022-JP
Content-Transfer-Encoding: 7bit

--------------010309080004020102040601
Content-Type: image/png;
 name="=?ISO-2022-JP?B?GyRCRUQbKEIucG5nLnBuZw==?="
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename*=ISO-2022-JP''%1B%24%42%45%44%1B%28%42%2E%70%6E%67%2E%70%6E%67

3. Save as HMTL

Result
======
HTML is encoded as UTF-8.


> be liable to break messages that currently work. (That's how bad our charset
> logic is).
> 
> Finally, I'd prefer <meta charset=""> over the http-equiv form.

Should we add DOCTYPE for HTML5, too?
Flags: needinfo?(Pidgeot18)
(In reply to Makoto Kato (:m_kato) from comment #21)
> When I test this, HTML always is encoded as UTF-8, not ISO-2022-JP.  How do
> you save to ISO-2022-JP?

I had actual Japanese text in the body. The HTML attachment name is saved as UTF-8, while the body text itself was ISO-2022-JP.

> > Finally, I'd prefer <meta charset=""> over the http-equiv form.
> 
> Should we add DOCTYPE for HTML5, too?

Probably not. The email-to-html predates HTML 4.01, and probably depends on quirks mode in a few places.
Flags: needinfo?(Pidgeot18)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: