Request for Position: Subresource loading with Web Bundles #590

hayatoito · 2021-11-04T00:53:39Z

Request for Mozilla Position on an Emerging Web Specification

Specification Title: Subresource loading with Web Bundles
Specification or proposal URL: https://wicg.github.io/webpackage/subresource-loading.html
Explainer: https://github.com/WICG/webpackage/blob/main/explainers/subresource-loading.md
Caniuse.com URL (optional):
Bugzilla URL (optional):
Mozillians who can provide input (optional):

Other information

See chrome status page: https://chromestatus.com/feature/5710618575241216

Chrome is doing the origin trial (1, 2) for this feature. I'm filing this request because it might be a good time to ask.

Thanks!

ShivanKaul · 2021-11-30T19:40:47Z

We had given feedback on this proposal here: WICG/webpackage#648 (comment), in case it's of interest

annevk · 2022-04-25T08:44:59Z

I was going to flag annevk/orb#32 here (using "no-cors" for subresource-from-bundle fetches), but I stumbled across https://bugs.chromium.org/p/chromium/issues/detail?id=1316660 so I take it that is already being tackled?

Also filed WICG/webpackage#735 as I don't think registered protocol handlers ought to be involved.

When I last discussed this kind of feature in detail, the overarching concern I had is that you end up having to reinvent the network protocol. For an initial visit by the end user you will have improved transfer due to compression being possible across responses without leaking information, but now you've lost caching and partial invalidation. So at some point you start discussing some kind of update protocol for the bundle, at which point things really start to get complex and largely duplicative.

hayatoito · 2022-04-25T09:39:47Z

Thanks for the reply and filing an issue.

Responses inline and there.

annevk/orb#32

This issue is related to "Signed Exchange". That is not related to "Subresource Loading with Web Bundles". They are different features. The latter does't use signed-exchange at all.

Unfortunately, both "Signed Exchange" and "Subresource Loading with Web Bundles" are using the same repository for a historical reason. I guess that is the root cause of the confusions, which we want to resolve somehow eventually... :(

Regarding "subresource loading with WebBundles", the request's mode is "cors" by default. https://github.com/WICG/webpackage/blob/main/explainers/subresource-loading.md#requests-mode-and-credentials-mode

annevk · 2022-04-25T10:49:51Z

Okay, so even if a request for a subresource was made using "no-cors", it's okay because the bundle itself was requested using "cors". And ORB isn't impacted because the lookup inside the bundle happens way before you're about to hit the network and a bundle cannot be used in a response to a normal subresource request, only in response to a request whose destination is "webbundle".

Upon scanning the explainer (which is great by the way, thanks for writing it up!) again I found https://docs.google.com/document/d/11t4Ix2bvF1_ZCV9HKfafGfWu82zbOD7aUhZ_FyDAgmA/edit about subsequent requests and that does indeed suggest that's both a direction this might be going in and that it's a hard unsolved problem. That makes me rather hesitant to endorse bundling as a solution.

hayatoito · 2022-04-27T05:10:13Z

Thanks! Responses inline below:

Okay, so even if a request for a subresource was made using "no-cors", it's okay because the bundle itself was requested using "cors". And ORB isn't impacted because the lookup inside the bundle happens way before you're about to hit the network and a bundle cannot be used in a response to a normal subresource request, only in response to a request whose destination is "webbundle".

Yes, that's right. We've added "webbundle" destination exactly for that reason.

Upon scanning the explainer (which is great by the way, thanks for writing it up!) again I found https://docs.google.com/document/d/11t4Ix2bvF1_ZCV9HKfafGfWu82zbOD7aUhZ_FyDAgmA/edit about subsequent requests and that does indeed suggest that's both a direction this might be going in and that it's a hard unsolved problem. That makes me rather hesitant to endorse bundling as a solution.

Thanks. That is an area which we'd like to explore together in v2.

In v1, we support fetching multiple requests in a single request with Web Bundles efficiently. It addressed some use cases (eg. #624, which doesn't need any caching support).

That's already beneficial.

However, we are aware that this couldn't address a loading issue for large JS apps, which relies on user-land bundling solution (eg. webpack) today and still difficult to take an advantage of a browser's cache easily. Thus, we need a kind of protocols by which we can avoid to transfer a resource if a browser has a cache, on the top of what v1 has achieved.

There are several proposals (listed in here too):

https://docs.google.com/document/d/11t4Ix2bvF1_ZCV9HKfafGfWu82zbOD7aUhZ_FyDAgmA/edit was one of that, and is still useful to understand the problem we want to solve.
Bundle Preloading
Bundle Dependencies
.. and others (not public yet; we are working on partners to brush up proposals)

The common factor of these proposals comes from: A large JS apps need a new primitive which enables:

Fetch multiple resources in a single request efficiently
But avoid to transfer a resource if a browser has a cache for that.

So at some point you start discussing some kind of update protocol for the bundle, at which point things really start to get complex and largely duplicative.

We agree that this is a hard, unresolved problem, but it is worth exploring to figure out what is the smallest web platform primitive which user-land solution can't achieve efficiently. We'd like to avoid duplication here.

I hope this is a good summary. Sooner or later, we'd like to write an Explainer with more details for v2. I hope we can work together there!

annevk · 2022-04-27T09:40:31Z

I'm not convinced that more efficient ads is sufficient for adding quite a bit of complexity to the web platform. It would also make them rather easy to block, which I suspect goes counter to your goals.

And without the "v2" part it's hard to judge what the complexity of this feature might end up being and if that justifies its cost. And if it is indeed preferable to further investment in network protocol solutions.

ShivanKaul · 2022-04-27T16:12:52Z

There is ongoing conversation on the TAG design review: w3ctag/design-reviews#616

chrishtr · 2022-04-29T16:06:18Z

It would also make them rather easy to block, which I suspect goes counter to your goals.

I don't think it's contrary to the goals of this feature. Here is some positive feedback from Google Ads shared as part of the current Blink Intent to Ship thread:

Google Ads (use case) (origin trial participant)

Web bundle serving is a major overhaul of how GPT requests and renders ads, built on top of a new browser API which we have been designing with the Chrome team. It offers large loading performance improvements and security and privacy relative to safeframe rendering:

Performance improvements by fetching multiple Ads creatives in a single request.

Enhance privacy: Creative contents can no longer be read or modified by the publisher or others in the publisher's JS context. Creatives can no longer read or modify each other.

The points raised in the issue sound compelling to me.

jeffkaufman · 2022-05-19T16:57:52Z

Confirming that "rather easy to block" is not a problem from an ads perspective. We're not trying to circumvent ad blockers here.

hayatoito · 2022-05-20T09:37:30Z

Re this concern,

When I last discussed this kind of feature in detail, the overarching concern I had is that you end up having to reinvent the network protocol. For an initial visit by the end user you will have improved transfer due to compression being possible across responses without leaking information, but now you've lost caching and partial invalidation. So at some point you start discussing some kind of update protocol for the bundle, at which point things really start to get complex and largely duplicative.

Our intent with v2 isn’t to reinvent the network protocol. Instead, we plan to use immutable subresource URLs to deal with updates. The browser would simply share a list of subresources it has with their versionized/hashed URLs, and the server would provide the ones that are not already cached in the browser.

We don’t have any intention of dealing with lower mechanisms for HTTP Cache, such as cache-control, if-modified-since, and there is no need for it. These mechanisms would only be used on the bundle itself, not the subresources within.

Example from one of the proposals under consideration:

The first visit (cold cache):

The server sends a main page which declares the required subresources as follows:
- a.v1234.js
- b.v12ea.js
- c.v12ef.js
The browser doesn’t have any cache for these resources. Thus, sends a list of [“a.v1234.js”, “b.v12ea.js”, “c.v12ef”] to fetch them in one request.
The server sends a bundle which contains:
- a.v1234.js
- b.v12ea.js
- c.v12ef.js
The browser caches these subresources.

The second visit (warm cache)

The server sends a main page which declares the required subresources:
- a.v1234.js
- b.v23ea.js
- c.v22ef.js
Note that resources “b” and “c” are updated. They have new URLs.
The browser sends a list of [“b.v23ea.js”, “c.v22ef.js”] to fetch them in one request.
The server sends a bundle which contains:
- b.v22ea.js
- c.v22ef.js

(More details)

martinthomson · 2022-05-23T00:53:39Z

The browser would simply share a list of subresources it has with their versionized/hashed URLs, and the server would provide the ones that are not already cached in the browser.

I don't know if you followed the story of draft-ietf-httpbis-cache-digest or not, but this is exactly the sort of thing that we spent a good amount of time on at that time. It's possible that some of the reasons we abandoned that line of investigation can be worked around by carefully constraining the problem, but this looks a lot like that framing.

The browser sends a list of [“b.v23ea.js”, “c.v22ef.js”] to fetch them in one request.

I don't know how to reconcile this with an "[...] intent with v2 isn’t to reinvent the network protocol." This is exactly what is being proposed. The same outcome can be attained by sending two requests for the separate resources, modulo some minor gains in byte efficiency. Bundle-Preload is an entirely new protocol. That it has properties (advantages or drawbacks) of both cache-digest and HTTP/2 server push concern me.

Ultimately, the problem you are attempting to solve here is fundamentally hard. My understanding of bundling has always been that it offers advantages to the extent that it allows us to populate more points in the space of trade-offs between atomic and monolithic resources. However, I don't see how this particular design would ultimately be better. The introduction of resource maps might be a net gain, but I am not seeing the advantages from the bundling aspect of the design.

hsivonen · 2022-05-23T08:28:34Z

I noticed that neither the explainer nor the spec mentions anything about what implications this would have on speculative HTML parsing.

It would be good to have a paragraph explaining the implications.

Notably, under speculative fetch in the HTML spec, there's a list of elements that may affect subsequent speculative fetches. Also, at present, the speculation-sensitive information travels in HTML attributes. It seems worthwhile to at least mention the novelty of the text content of an element becoming speculation-sensitive.

yoavweiss · 2022-05-24T08:20:25Z

I don't know if you followed the story of draft-ietf-httpbis-cache-digest or not, but this is exactly the sort of thing that we spent a good amount of time on at that time. It's possible that some of the reasons we abandoned that line of investigation can be worked around by carefully constraining the problem, but this looks a lot like that framing.

Hey @martinthomson! I kinda followed the story :)
WebBundle cache awareness is likely to be different from Cache Digests in that we're not talking about sending a digest of all the resources from a certain origin that are in the browser's cache. The various proposals on that fron all talk about the browser sending a request to the bundle while sending a list of resources (or their hashes) from the ones that are in the bundle itself (either the ones needed to download, or the ones needed for subsetting).

FWIW, what killed Cache Digests is not any particular issue, but lack of implementer interest.

I don't know how to reconcile this with an "[...] intent with v2 isn’t to reinvent the network protocol." This is exactly what is being proposed. The same outcome can be attained by sending two requests for the separate resources, modulo some minor gains in byte efficiency. Bundle-Preload is an entirely new protocol. That it has properties (advantages or drawbacks) of both cache-digest and HTTP/2 server push concern me.

I don't think it creates an entirely new protocol, but provides a way for HTTP to request multiple resources from a single origin in a single bundle, and then have those resources be compressed as a single entity.
I think that what @hayatoito meant by "not reinventing the network protocol" is that as far as cache controls, content negotiation and other capabilities that HTTP has, those capabilities will remain in HTTP and applicable only to the bundle as a whole, rather than to any one of the subresources it contains.

Ultimately, the problem you are attempting to solve here is fundamentally hard. My understanding of bundling has always been that it offers advantages to the extent that it allows us to populate more points in the space of trade-offs between atomic and monolithic resources. However, I don't see how this particular design would ultimately be better. The introduction of resource maps might be a net gain, but I am not seeing the advantages from the bundling aspect of the design.

The way I see it:

JS apps bundle their resources today, because the current protocols and/or implementations are lacking when it comes to fetching hundreds or thousands of resources.
Bundles enable us to move away from JS blobs and onto a platform supported format that would enable this.
The advantages of bundling come both on the network (with superior (potentially-offline) compression over individual resources), as well as in the browser itself (by potentially reducing the internal processing overhead of requests)

I tried in the past to tackle some of the compression benefits as part of Compression Dictionaries but that effort was shot down as being overly broad and hence too dangerous from a security perspective. WebBundles seem like a reasonable way to move the bundling responsibility to the origin as part of its build process (and hence, have them only bundle together non-credentialed resources), avoiding the risks of cross-resource compression revealing secrets from credentialed resources.

hayatoito · 2022-05-24T08:26:15Z

Re:

I noticed that neither the explainer nor the spec mentions anything about what implications this would have on speculative HTML parsing.

A good point! Thanks. I've filed an issue WICG/webpackage#747.

martinthomson · 2022-05-25T07:53:41Z

After discussing this at some length, we realized that there are a number of moving parts that are hard to disentangle.

Bundles

The idea of bundling resources and delivering them as a single unit is hard to object to. You can get crude versions of that in a number of ways (HTML inlining, JS bundlers, and data: URIs are things we already support). A proper, generic format for assembling diverse content into a single resource is - on the face of that - totally reasonable. And zip or tar files are a spectacularly poor way of achieving that, so the definition of a new format is justified.

On its own, divorced from some of the other features that build on this, it isn't necessarily a compelling feature, but we can pretty easily convince ourselves that it isn't bad for the web in any way, at least in the abstract.

We haven't really spent a lot of time looking into the details of the design of the format yet, because a lot of our attention has been drawn to other more challenging aspects of the proposal (or suite of proposals, you might say).

For instance, the use of magic numbers elicited a lengthy conversation about their value relative to media types for use on the web (CORB seems to have tipped the balance toward media types). I think that I understand why magic numbers have been proposed in this specific case and it might be justified, but we haven't really sat down to understand that aspect in any detail. No doubt there are other aspects of the format that would elicit similar discussions.

Resource Identification

I still reserve concerns about how bundle components are identified, or whether they need to be. We've had a number of conversations about this, but haven't really resolved anything (at least from my recollection) other than that the problem is hard. That one resource can now speak for another subverts the URL resolution process as the primary means of establishing authority. The use of scoping only partially mitigates that concern.

The use of UUIDs and the definition of a new URI scheme (or is it a resource specifier; see below) introduces another concept into the mix. UUIDs are useful for managing collision risk, but they don't provide any uniqueness guarantee if you allow for adversarial content being loaded. Their use in CSP seems inadvisable in that light, particularly since the list of bundle registrations is mutable, which might allow an attacker to supplant an allowed uuid-in-package resource.

New Indirections

The notion of resource maps adds a new layer of indirection to the platform.

JS have formalized this in their narrow domain with their language around specifiers and URLs in a way that makes a fair bit of sense. Permitting the use of arbitrary specifiers that are mapped before being treated as URLs is a powerful tool in its own right and one that requires careful consideration.

This is a powerful tool that probably requires its own consideration. Here we need to consider the implications on the various security functions we have built, like CSP.

Compression

Specification-wise, this might be free if it uses content-codings. This probably doesn't need too much discussion, except to note the effect on performance of different strategies, particularly when it comes to the subset piece.

Bundle Subsets/Selective Fetch

This stuff is the subject of the recent discussion here. This stuff is highly speculative and I don't think we are able to take a position on the design being proposed just yet. I stand by my statement that this is a new protocol, even if it falls short of a total reimagining of the protocol stack. Its dependence on Vary/Variants and an understanding of bundle content for good performance gives it something of a difficult deployment challenge to overcome.

Combining These

Obviously, you don't realize many benefits until you put a few of these pieces together, but you don't need to solve everything before you get some useful features.

hayatoito mentioned this issue May 24, 2022

Clarify how <script type=webbundle> affects speculative HTML parsing. WICG/webpackage#747

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for Position: Subresource loading with Web Bundles #590

Request for Position: Subresource loading with Web Bundles #590

Request for Position: Subresource loading with Web Bundles #590

Request for Position: Subresource loading with Web Bundles #590

Comments

Request for Mozilla Position on an Emerging Web Specification

Other information

The first visit (cold cache):

The second visit (warm cache)

Bundles

Resource Identification

New Indirections

Compression

Bundle Subsets/Selective Fetch

Combining These