Open Bug 56908 Opened 24 years ago Updated 2 years ago

Non-ascii file name is displayed incorrectly in the browser after being saved in .eml format

Tracking

(Not tracked)

Status:

NEW

People

(Reporter: marina, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: intl, Whiteboard: [patchlove][needs updated patch?])

Attachments

(2 files, 1 obsolete file)

attached a screen shot 24 years ago marina 76.40 KB, image/jpeg		Details
a patch. I can check in. please review... 24 years ago Makoto Kato [:m_kato] 855 bytes, patch		Details \| Diff \| Splinter Review
rebased patch 11 years ago Makoto Kato [:m_kato] 957 bytes, patch	jcranmer : review-	Details \| Diff \| Splinter Review

marina

Reporter

Description

•

24 years ago

Steps to reproduce:
- invoke a new mail composition;
- attach a file with non-ascii name;
- send and get message;
- now select message and save it in eml format;
- open the saved file in Browser by going  File|Open (select file)
//now note: non-ascii name for the attached file in the body displays single 
non-ascii chars as two single chars

marina

Reporter

Comment 1

•

24 years ago

Attached image attached a screen shot — Details

nhottanscp

Comment 2

•

24 years ago

Reassign to rhp.

Assignee: nhotta → rhp

rhp (gone)

Comment 3

•

24 years ago

Sorry, this isn't happening for rtm.

- rhp

Status: NEW → ASSIGNED

Target Milestone: --- → Future

Makoto Kato [:m_kato]

Comment 4

•

24 years ago

Attached patch a patch. I can check in. please review... (obsolete) — Details — Splinter Review

Erik van der Poel

Comment 5

•

24 years ago

I don't think that that is a good way to solve this problem. META charset forces
the HTML engine to restart with the new charset. It would be better to start
with the right charset in the first place. In the HTTP world, this is done by
an HTTP Content-Type header. In this case, the mail engine is generating some
HTML, but HTTP might not be involved, so we probably have to pass something to
the HTML engine to make it believe that the HTTP charset has been set. I don't
know the details, but I believe the architecture would be better that way.

Makoto Kato [:m_kato]

Comment 6

•

24 years ago

I understand you say.  But if auto-detect engine of Mozilla is very smart, this 
issue doesn't occur.  Is there the best way whether encoding is UTF-8??

Erik van der Poel

Comment 7

•

24 years ago

No no no. Auto-detect is even worse than META charset, architecturally. Take a
look at the APIs for the HTML engine, and see if there is some way to make it
believe that the HTTP charset has been set. We are generating the HTML, so we
don't want to rely on any auto-detect heuristics when consuming that HTML.

Scott MacGregor

Comment 8

•

24 years ago

If it helps any, the html engine should be getting the content via the channel.
nsIChannel::GetContentType. Our mime engine sets the content type on the channel
it presents to the html parser. The parser should be using this information
(this should be the same way the parser gets the content type from the http channel)

Katsuhiko Momoi

Updated

•

24 years ago

Keywords: intl

Katsuhiko Momoi

Comment 9

•

24 years ago

Mass change to bugs filed by marina --> QA contact to marina.
thanks!

QA Contact: momoi → marina

scottputterman

Comment 10

•

24 years ago

reassigning to ducarroz

Assignee: rhp → ducarroz

Status: ASSIGNED → NEW

Jean-Francois Ducarroz

Updated

•

23 years ago

Status: NEW → ASSIGNED

Jungshik Shin

Comment 11

•

21 years ago

I understand erik's concern, but I have a different opinion. It's not that much
expensive to reset the charset and it's happening everyday on the web. There are
a lot of people who believe that http header should have been given a lower (not
higher) priority than 'meta charset'.  Besides, in cases eml files are moved
around, including 'charset' information in it is a good thing (TM). 

mscott, can we make an assumption that eml files have been always in UTF-8? What
if somebody transcodes them outside Mozilla? Well, that's her responsibility.
So, my question would be if Mozilla always used UTF-8 for eml files. If yes, we
may do something in nsIChannel.

OS: Windows NT → All

Hardware: PC → All

OstGote!

Comment 12

•

20 years ago

All my eml messages with umlauts (char encoding ISO-8859-1 etc) are wrongly
displayed in the browser window. If I manually switch the encoding to UTF-8 the
display is correct. See for example bug 206421.

Ralf Hauser

Comment 13

•

20 years ago

related is Bug 263850

Myk Melez [:myk] [@mykmelez]

Updated

•

20 years ago

Product: MailNews → Core

WADA:World Anti-bad-Duping Agency

Comment 14

•

20 years ago

Change 'eml' in summary to '.eml' for ease of search ('extremly' will hit).

Summary: Non-ascii file name is displayed incorrectly in the browser after being saved in eml format → Non-ascii file name is displayed incorrectly in the browser after being saved in .eml format

WADA:World Anti-bad-Duping Agency

Updated

•

20 years ago

Depends on: 269826

WADA:World Anti-bad-Duping Agency

Updated

•

20 years ago

Blocks: 269826

No longer depends on: 269826

Jungshik Shin

Comment 15

•

20 years ago

The patch here can be still applied. If it's still a problem, we should check
that in. Given that '.eml' file can be moved around and viewed by a program
other than Mozilla, it should have 'in-band' information about the character
encoding.

Nobody; OK to take it and work on it

Assignee

Updated

•

16 years ago

Product: Core → MailNews Core

Phil Ringnalda (:philor)

Updated

•

16 years ago

QA Contact: marina → i18n

Wayne Mery (:wsmwk)

Updated

•

12 years ago

Assignee: bugzilla → nobody

Status: ASSIGNED → NEW

Wayne Mery (:wsmwk)

Comment 16

•

12 years ago

Makoto Kato, is your patch still needed and good?

Flags: needinfo?(m_kato)

Priority: P3 → --

Whiteboard: [patchlove][needs updated patch?]

Target Milestone: Future → ---

Makoto Kato [:m_kato]

Comment 17

•

11 years ago

(In reply to Wayne Mery (:wsmwk) from comment #16)
> Makoto Kato, is your patch still needed and good?

This depends on HTML rendering engine implementation.  Gecko detects as UTF-8 even if no charset, but IE cannot detect as UTF-8.

Flags: needinfo?(m_kato)

Makoto Kato [:m_kato]

Comment 18

•

11 years ago

tested on Firefox 29, Chrome 34 and IE11.  Chrome 34 and IE11 cannot detect exported HTML as UTF-8.  So character corruption is caused on these browsers.

Makoto Kato [:m_kato]

Comment 19

•

11 years ago

Attached patch rebased patch — Details — Splinter Review

Attachment #19335 - Attachment is obsolete: true

Makoto Kato [:m_kato]

Updated

•

11 years ago

Attachment #8356940 - Flags: review?(Pidgeot18)

Joshua Cranmer [:jcranmer]

Comment 20

•

11 years ago

Comment on attachment 8356940 [details] [diff] [review]
rebased patch

Review of attachment 8356940 [details] [diff] [review]:
-----------------------------------------------------------------

First off, this needs a test.
Second off, this patch is wrong. I created a message with an ISO-2022-JP body text (and a non-ASCII filename), and found that the resulting HTML saved the part as ISO-2022-JP instead of UTF-8, so declaring a charset would be liable to break messages that currently work. (That's how bad our charset logic is).

Finally, I'd prefer <meta charset=""> over the http-equiv form.

Attachment #8356940 - Flags: review?(Pidgeot18) → review-

Makoto Kato [:m_kato]

Comment 21

•

11 years ago

(In reply to Joshua Cranmer [:jcranmer] from comment #20)
> Comment on attachment 8356940 [details] [diff] [review]
> rebased patch
> 
> Review of attachment 8356940 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> First off, this needs a test.
> Second off, this patch is wrong. I created a message with an ISO-2022-JP
> body text (and a non-ASCII filename), and found that the resulting HTML
> saved the part as ISO-2022-JP instead of UTF-8,

When I test this, HTML always is encoded as UTF-8, not ISO-2022-JP.  How do you save to ISO-2022-JP?

Step
====
1. Compose message as ISO-2022-JP
2. send and receive this.  Message is the following.

--------------010309080004020102040601
Content-Type: text/plain; charset=ISO-2022-JP
Content-Transfer-Encoding: 7bit

--------------010309080004020102040601
Content-Type: image/png;
 name="=?ISO-2022-JP?B?GyRCRUQbKEIucG5nLnBuZw==?="
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename*=ISO-2022-JP''%1B%24%42%45%44%1B%28%42%2E%70%6E%67%2E%70%6E%67

3. Save as HMTL

Result
======
HTML is encoded as UTF-8.


> be liable to break messages that currently work. (That's how bad our charset
> logic is).
> 
> Finally, I'd prefer <meta charset=""> over the http-equiv form.

Should we add DOCTYPE for HTML5, too?

Flags: needinfo?(Pidgeot18)

Joshua Cranmer [:jcranmer]

Comment 22

•

11 years ago

(In reply to Makoto Kato (:m_kato) from comment #21)
> When I test this, HTML always is encoded as UTF-8, not ISO-2022-JP.  How do
> you save to ISO-2022-JP?

I had actual Japanese text in the body. The HTML attachment name is saved as UTF-8, while the body text itself was ISO-2022-JP.

> > Finally, I'd prefer <meta charset=""> over the http-equiv form.
> 
> Should we add DOCTYPE for HTML5, too?

Probably not. The email-to-html predates HTML 4.01, and probably depends on quirks mode in a few places.

Flags: needinfo?(Pidgeot18)

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.