[go: nahoru, domu]

Reland Canonicalise file:///./s: as file:///S: on Win

The original CL
https://chromium-review.googlesource.com/c/chromium/src/+/2860002
was reverted, because of a missed update in one test
expectation, which only showed on Win 7.

In the meantime, a sync from WPT resolved that failing
expectation
(https://chromium-review.googlesource.com/c/chromium/src/+/2871329)
so this CL also undoes the Win7 disabling of the
test
(https://chromium-review.googlesource.com/c/chromium/src/+/2874645).

Original CL description:

============
Canonicalise file:///./s: as file:///S: on Win

Chrome on Windows converts drive letters to upper case.
Currently it only recognises drive letters appearing
at the start of the path, optionally after a block of
slashes.

This leads to an inconsistency, where canonicalized URLs
are themselves not canonical. Example:
1. file:///./s: canonicalizes as file:///s:, because the
   initial "s:" is not recognised as a drive letter (there
   is "." in the path prefix preceding it).
2. However, file:///s: itself canonicalizes as file:///S:,
   because now the "s:" is at the beginning and thus
   recognised as a drive letter and converted to upper case.

That is bad, because canonicalzed URLs must be canonical,
following their very definition.

The associated bug https://crbug.com/1083031 discusses
various approaches to fix this, and the consensus among
domenic@, rsleevi@ and vabr@ was that the least bad solution
to ensure that canonicalizer URLs are canonical is to
treat each drive letter specification in the path as such
as long as the path segment preceding it has the canonical
form "/". This fixes the above problem with the non-canonical
results of canonicalization.

The fix comes with a cost of increasing the number of failing
web conformance tests for Chrome on Windows. However, it does
not really create a new problem:

First, already today, Chrome on Win canonicalizes
file://////s: as file:///S:, while outside of Win the result
is file:///s:.

Second, already today, Chrome on Win canonicalizes
file:///s: as file:///S:. This is, in fact, _the_ problem and
has been raised in https://github.com/whatwg/url/issues/515.
============

Bug: 1083031
Bug: 1205779
Change-Id: I40abdbf0e260fc4d9f1c908450b4eb2541b5b10a
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2874649
Auto-Submit: Vaclav Brozek <vabr@chromium.org>
Commit-Queue: Mike West <mkwst@chromium.org>
Reviewed-by: Domenic Denicola <domenic@chromium.org>
Reviewed-by: Mike West <mkwst@chromium.org>
Cr-Commit-Position: refs/heads/master@{#879692}
diff --git a/third_party/blink/web_tests/TestExpectations b/third_party/blink/web_tests/TestExpectations
index c1a0889f..ad5f189 100644
--- a/third_party/blink/web_tests/TestExpectations
+++ b/third_party/blink/web_tests/TestExpectations
@@ -7024,7 +7024,6 @@
 crbug.com/1200134 [ Linux ] fast/gradients/unprefixed-repeating-gradient-color-hint.html [ Pass Failure ]
 
 # Sheriff 2021-05-05
-crbug.com/1205779 [ Win7 ] external/wpt/url/a-element.html [ Failure ]
 crbug.com/1205780 [ Win7 ] external/wpt/css/mediaqueries/test_media_queries.html [ Failure ]
 crbug.com/1205780 [ Mac10.14 ] external/wpt/css/mediaqueries/test_media_queries.html [ Failure ]
 crbug.com/1205780 [ Mac10.15 ] external/wpt/css/mediaqueries/test_media_queries.html [ Failure Pass ]
diff --git a/third_party/blink/web_tests/platform/win/external/wpt/url/a-element-xhtml-expected.txt b/third_party/blink/web_tests/platform/win/external/wpt/url/a-element-xhtml-expected.txt
index a73b57b..e98ba5a 100644
--- a/third_party/blink/web_tests/platform/win/external/wpt/url/a-element-xhtml-expected.txt
+++ b/third_party/blink/web_tests/platform/win/external/wpt/url/a-element-xhtml-expected.txt
@@ -1,5 +1,5 @@
 This is a testharness.js-based test.
-Found 604 tests; 349 PASS, 255 FAIL, 0 TIMEOUT, 0 NOTRUN.
+Found 604 tests; 348 PASS, 256 FAIL, 0 TIMEOUT, 0 NOTRUN.
 PASS Loading data…
 PASS Parsing: <http://example	.
 org> against <http://example.org/foo/bar>
@@ -493,7 +493,7 @@
 FAIL Parsing: <file:///y:> against <about:blank> assert_equals: href expected "file:///y:" but got "file:///Y:"
 FAIL Parsing: <file:///y:/> against <about:blank> assert_equals: href expected "file:///y:/" but got "file:///Y:/"
 PASS Parsing: <file:///./y> against <about:blank>
-PASS Parsing: <file:///./y:> against <about:blank>
+FAIL Parsing: <file:///./y:> against <about:blank> assert_equals: href expected "file:///y:" but got "file:///Y:"
 FAIL Parsing: <\\\.\y:> against <about:blank> assert_unreached: Expected URL to fail parsing Reached unreachable code
 FAIL Parsing: <file://localhost//a//../..//foo> against <about:blank> assert_equals: href expected "file://///foo" but got "file://localhost///foo"
 FAIL Parsing: <file://localhost////foo> against <about:blank> assert_equals: href expected "file://////foo" but got "file://localhost////foo"
diff --git a/third_party/blink/web_tests/platform/win/external/wpt/url/url-constructor.any-expected.txt b/third_party/blink/web_tests/platform/win/external/wpt/url/url-constructor.any-expected.txt
index 6ba861a..bb793568 100644
--- a/third_party/blink/web_tests/platform/win/external/wpt/url/url-constructor.any-expected.txt
+++ b/third_party/blink/web_tests/platform/win/external/wpt/url/url-constructor.any-expected.txt
@@ -1,5 +1,5 @@
 This is a testharness.js-based test.
-Found 604 tests; 416 PASS, 188 FAIL, 0 TIMEOUT, 0 NOTRUN.
+Found 604 tests; 415 PASS, 189 FAIL, 0 TIMEOUT, 0 NOTRUN.
 PASS Loading data…
 PASS Parsing: <http://example	.
 org> against <http://example.org/foo/bar>
@@ -557,7 +557,7 @@
 FAIL Parsing: <file:///y:> against <about:blank> assert_equals: href expected "file:///y:" but got "file:///Y:"
 FAIL Parsing: <file:///y:/> against <about:blank> assert_equals: href expected "file:///y:/" but got "file:///Y:/"
 PASS Parsing: <file:///./y> against <about:blank>
-PASS Parsing: <file:///./y:> against <about:blank>
+FAIL Parsing: <file:///./y:> against <about:blank> assert_equals: href expected "file:///y:" but got "file:///Y:"
 FAIL Parsing: <\\\.\y:> against <about:blank> assert_throws_js: function "function() {
           bURL(expected.input, expected.base)
         }" did not throw
diff --git a/third_party/blink/web_tests/platform/win/external/wpt/url/url-constructor.any.worker-expected.txt b/third_party/blink/web_tests/platform/win/external/wpt/url/url-constructor.any.worker-expected.txt
index 6ba861a..bb793568 100644
--- a/third_party/blink/web_tests/platform/win/external/wpt/url/url-constructor.any.worker-expected.txt
+++ b/third_party/blink/web_tests/platform/win/external/wpt/url/url-constructor.any.worker-expected.txt
@@ -1,5 +1,5 @@
 This is a testharness.js-based test.
-Found 604 tests; 416 PASS, 188 FAIL, 0 TIMEOUT, 0 NOTRUN.
+Found 604 tests; 415 PASS, 189 FAIL, 0 TIMEOUT, 0 NOTRUN.
 PASS Loading data…
 PASS Parsing: <http://example	.
 org> against <http://example.org/foo/bar>
@@ -557,7 +557,7 @@
 FAIL Parsing: <file:///y:> against <about:blank> assert_equals: href expected "file:///y:" but got "file:///Y:"
 FAIL Parsing: <file:///y:/> against <about:blank> assert_equals: href expected "file:///y:/" but got "file:///Y:/"
 PASS Parsing: <file:///./y> against <about:blank>
-PASS Parsing: <file:///./y:> against <about:blank>
+FAIL Parsing: <file:///./y:> against <about:blank> assert_equals: href expected "file:///y:" but got "file:///Y:"
 FAIL Parsing: <\\\.\y:> against <about:blank> assert_throws_js: function "function() {
           bURL(expected.input, expected.base)
         }" did not throw
diff --git a/url/url_canon_fileurl.cc b/url/url_canon_fileurl.cc
index 2aa5824..6d6ade9 100644
--- a/url/url_canon_fileurl.cc
+++ b/url/url_canon_fileurl.cc
@@ -25,29 +25,47 @@
 int FileDoDriveSpec(const CHAR* spec, int begin, int end,
                     CanonOutput* output) {
   // The path could be one of several things: /foo/bar, c:/foo/bar, /c:/foo,
-  // (with backslashes instead of slashes as well).
-  int num_slashes = CountConsecutiveSlashes(spec, begin, end);
-  int after_slashes = begin + num_slashes;
+  // /./c:/foo, (with backslashes instead of slashes as well). The code
+  // first guesses the beginning of the drive letter, then verifies that the
+  // path up to that point can be canonicalised as "/". If it can, then the
+  // found drive letter is indeed a drive letter, otherwise the path has no
+  // drive letter in it.
+  if (begin > end)  // Nothing to search in.
+    return begin;   // Found no letter, so didn't consum any characters.
 
-  if (!DoesBeginWindowsDriveSpec(spec, after_slashes, end))
-    return begin;  // Haven't consumed any characters
+  // If there is something that looks like a drive letter in the spec between
+  // being and end, store its position in drive_letter_pos.
+  int drive_letter_pos =
+      DoesContainWindowsDriveSpecUntil(spec, begin, end, end);
+  if (drive_letter_pos < begin)
+    return begin;  // Found no letter, so didn't consum any characters.
 
-  // A drive spec is the start of a path, so we need to add a slash for the
-  // authority terminator (typically the third slash).
-  output->push_back('/');
+  // Check if the path up to the drive letter candidate can be canonicalized as
+  // "/".
+  Component sub_path = MakeRange(begin, drive_letter_pos);
+  Component output_path;
+  const int initial_length = output->length();
+  bool success = CanonicalizePath(spec, sub_path, output, &output_path);
+  if (!success || output_path.len != 1 ||
+      output->at(output_path.begin) != '/') {
+    // Undo writing the canonicalized path.
+    output->set_length(initial_length);
+    return begin;  // Found no letter, so didn't consum any characters.
+  }
 
-  // DoesBeginWindowsDriveSpec will ensure that the drive letter is valid
-  // and that it is followed by a colon/pipe.
+  // By now, "/" has been written to the output and a valid drive letter is
+  // confirmed at position drive_letter_pos, followed by a valid drive letter
+  // separator (a colon or a pipe).
 
-  // Normalize Windows drive letters to uppercase
-  if (base::IsAsciiLower(spec[after_slashes]))
-    output->push_back(static_cast<char>(spec[after_slashes] - 'a' + 'A'));
+  // Normalize Windows drive letters to uppercase.
+  if (base::IsAsciiLower(spec[drive_letter_pos]))
+    output->push_back(static_cast<char>(spec[drive_letter_pos] - 'a' + 'A'));
   else
-    output->push_back(static_cast<char>(spec[after_slashes]));
+    output->push_back(static_cast<char>(spec[drive_letter_pos]));
 
   // Normalize the character following it to a colon rather than pipe.
   output->push_back(':');
-  return after_slashes + 2;
+  return drive_letter_pos + 2;
 }
 
 #endif  // WIN32
diff --git a/url/url_canon_unittest.cc b/url/url_canon_unittest.cc
index fb27fe7..8a121c58 100644
--- a/url/url_canon_unittest.cc
+++ b/url/url_canon_unittest.cc
@@ -1851,20 +1851,28 @@
       // Busted refs shouldn't make the whole thing fail.
       {"file:///C:/asdf#\xc2", "file:///C:/asdf#%EF%BF%BD", true, Component(),
        Component(7, 8)},
+      {"file:///./s:", "file:///S:", true, Component(), Component(7, 3)},
 #else
       // Unix-style paths
-    {"file:///home/me", "file:///home/me", true, Component(), Component(7, 8)},
+      {"file:///home/me", "file:///home/me", true, Component(),
+       Component(7, 8)},
       // Windowsy ones should get still treated as Unix-style.
-    {"file:c:\\foo\\bar.html", "file:///c:/foo/bar.html", true, Component(), Component(7, 16)},
-    {"file:c|//foo\\bar.html", "file:///c%7C//foo/bar.html", true, Component(), Component(7, 19)},
+      {"file:c:\\foo\\bar.html", "file:///c:/foo/bar.html", true, Component(),
+       Component(7, 16)},
+      {"file:c|//foo\\bar.html", "file:///c%7C//foo/bar.html", true,
+       Component(), Component(7, 19)},
+      {"file:///./s:", "file:///s:", true, Component(), Component(7, 3)},
       // file: tests from WebKit (LayoutTests/fast/loader/url-parse-1.html)
-    {"//", "file:///", true, Component(), Component(7, 1)},
-    {"///", "file:///", true, Component(), Component(7, 1)},
-    {"///test", "file:///test", true, Component(), Component(7, 5)},
-    {"file://test", "file://test/", true, Component(7, 4), Component(11, 1)},
-    {"file://localhost",  "file://localhost/", true, Component(7, 9), Component(16, 1)},
-    {"file://localhost/", "file://localhost/", true, Component(7, 9), Component(16, 1)},
-    {"file://localhost/test", "file://localhost/test", true, Component(7, 9), Component(16, 5)},
+      {"//", "file:///", true, Component(), Component(7, 1)},
+      {"///", "file:///", true, Component(), Component(7, 1)},
+      {"///test", "file:///test", true, Component(), Component(7, 5)},
+      {"file://test", "file://test/", true, Component(7, 4), Component(11, 1)},
+      {"file://localhost", "file://localhost/", true, Component(7, 9),
+       Component(16, 1)},
+      {"file://localhost/", "file://localhost/", true, Component(7, 9),
+       Component(16, 1)},
+      {"file://localhost/test", "file://localhost/test", true, Component(7, 9),
+       Component(16, 5)},
 #endif  // _WIN32
   };
 
diff --git a/url/url_file.h b/url/url_file.h
index 45b8d9a..91b2ddc 100644
--- a/url/url_file.h
+++ b/url/url_file.h
@@ -34,23 +34,44 @@
 
 #ifdef WIN32
 
+// DoesContainWindowsDriveSpecUntil returns the least number between
+// start_offset and max_offset such that the spec has a valid drive
+// specification starting at that offset. Otherwise it returns -1. This function
+// gracefully handles, by returning -1, start_offset values that are equal to or
+// larger than the spec_len, and caps max_offset appropriately to simplify
+// callers. max_offset must be at least start_offset.
+template <typename CHAR>
+inline int DoesContainWindowsDriveSpecUntil(const CHAR* spec,
+                                            int start_offset,
+                                            int max_offset,
+                                            int spec_len) {
+  CHECK_LE(start_offset, max_offset);
+  if (start_offset > spec_len - 2)
+    return -1;  // Not enough room.
+  if (max_offset > spec_len - 2)
+    max_offset = spec_len - 2;
+  for (int offset = start_offset; offset <= max_offset; ++offset) {
+    if (!base::IsAsciiAlpha(spec[offset]))
+      continue;  // Doesn't contain a valid drive letter.
+    if (!IsWindowsDriveSeparator(spec[offset + 1]))
+      continue;  // Isn't followed with a drive separator.
+    return offset;
+  }
+  return -1;
+}
+
 // Returns true if the start_offset in the given spec looks like it begins a
 // drive spec, for example "c:". This function explicitly handles start_offset
 // values that are equal to or larger than the spec_len to simplify callers.
 //
 // If this returns true, the spec is guaranteed to have a valid drive letter
-// plus a colon starting at |start_offset|.
-template<typename CHAR>
-inline bool DoesBeginWindowsDriveSpec(const CHAR* spec, int start_offset,
+// plus a drive letter separator (a colon or a pipe) starting at |start_offset|.
+template <typename CHAR>
+inline bool DoesBeginWindowsDriveSpec(const CHAR* spec,
+                                      int start_offset,
                                       int spec_len) {
-  int remaining_len = spec_len - start_offset;
-  if (remaining_len < 2)
-    return false;  // Not enough room.
-  if (!base::IsAsciiAlpha(spec[start_offset]))
-    return false;  // Doesn't start with a valid drive letter.
-  if (!IsWindowsDriveSeparator(spec[start_offset + 1]))
-    return false;  // Isn't followed with a drive separator.
-  return true;
+  return DoesContainWindowsDriveSpecUntil(spec, start_offset, start_offset,
+                                          spec_len) == start_offset;
 }
 
 // Returns true if the start_offset in the given text looks like it begins a