[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect import order with code splitting and multiple entry points #399

Open
iamakulov opened this issue Sep 20, 2020 · 25 comments
Open

Comments

@iamakulov
Copy link
iamakulov commented Sep 20, 2020

For the workaround, see #399 (comment).


Consider the following code:

// entry1.js
import "./init-dep-1.js";
import "./run-dep.js";

// entry2.js
import "./init-dep-2.js";
import "./run-dep.js";

// init-dep-1.js
global.foo = {
  log: () => console.log("foo.log() (from entry 1) called"),
};

// init-dep-2.js
global.foo = {
  log: () => console.log("foo.log() (from entry 2) called"),
};

// run-dep.js
global.foo.log();

When you bundle this code with esbuild --bundle --outdir=build --format=esm --splitting --platform=node --out-extension:.js=.mjs ./entry1.js ./entry2.js, ESBuild discovers that run-dep.js is a module shared between both entry points. Because code splitting is enabled, ESBuild moves it into a separate chunk:

// build/entry1.js
import "./chunk.P2RPFHF3.js";

global.foo = {
  log: () => console.log("foo.log() (from entry 1) called")
};

However, by doing so, ESBuild puts run-dep.js above the init-dep-1.js or init-dep-2.js code. This changes the import order – and breaks the code:

$ esbuild --bundle --outdir=build --format=esm --splitting --platform=node --out-extension:.js=.mjs ./entry1.js ./entry2.js
$ node build/entry1.mjs
global.foo.log();
           ^

TypeError: Cannot read property 'log' of undefined
@evanw
Copy link
Owner
evanw commented Sep 21, 2020

As far as I know the execution order of imported ECMAScript modules is undefined (see also this question). The run-time is free to, for example, asynchronously download all imports in parallel and evaluate them in the order that their downloads finish, which could be different on different runs. The only guarantee is that all imports will be evaluated before any code in the script doing the importing is evaluated.

Using ECMAScript module imports means having to write your code to be robust to different import execution orders. The bundling algorithm in esbuild takes advantage of this flexibility to do the above transformation.

If you need deterministic order of imported code, you can either use require() instead of import or use import but defer the execution of certain code by wrapping it in a function and calling it later.

@guybedford
Copy link
Contributor
guybedford commented Sep 21, 2020

Module evaluation in ECMA-262 is synchronous from beginning to end. It is defined strictly as a post-order graph execution in https://tc39.es/ecma262/#sec-innermoduleevaluation. There isn't any room for misinterpretation of this and no JS engines violate this behaviour.

What is not synchronous is the instantiation phase which means that a top-level execution job has task queue yielding before execution begins. The confusion in eg that Stack Overflow thread is because it is not a functional execution like require() but a graph execution.

There should be no task queue operations triggered during the entire top-level execution operation once it starts.

The only case a module graph executes asynchronously is if one or more of the modules in that graph use top-level await. In such a case, the first top-level await acts as a sort of yield for that level of the dependency tree. The full graph is still executed synchronuosly up to this first top-level await in the post-order traversal, and resumes as a callback and synchronously after it with dependencies queues in parallel for this waiting.

The sync post-order execution is the start of the semantics though, and top-level await was added very much as an additional adjustment to this algorithm without loosening any of these strong constraints which was one of the key goals for its specification.

@evanw
Copy link
Owner
evanw commented Sep 21, 2020

Thanks for the additional detail. I have personally struggled to find information about how this works, and that Stack Overflow question was pretty much the only reference I could find (I personally find that part of the specification incomprehensible). To clarify: are you saying the Stack Overflow answer is simply incorrect? If so I need to do an audit of the bundler and I may need to rework more than just this, since this is the mental model I've been working with while writing the bundler.

There should be no task queue operations triggered during the entire top-level execution operation once it starts.

Does that mean transforming ECMAScript modules to AMD is an incorrect transformation? I'm not too familiar with AMD but I believe it has the semantics I originally described, not these semantics.

@guybedford
Copy link
Contributor

To clarify: are you saying the Stack Overflow answer is simply incorrect?

Yes!

The specification linked there is an implementation of the tarjan strongly connected components algorithm, which is the missing background if wanting to decipher it.

Does that mean transforming ECMAScript modules to AMD is an incorrect transformation? I'm not too familiar with AMD but I believe it has the semantics I originally described, not these semantics.

I think you might be right here that AMD does not specify timing at all and RequireJS for example does not seem like it will wait to execute everything as a single top-level job, but rather executes dependencies whenever they are ready. This is certainly not the only lossy aspect semantically though.

@guybedford
Copy link
Contributor

Also to clarify, the only reason the spec has to use Tarjan is to ensure that the module registry loading states for a cycle transition together and do not leave partial states for "strongly connected components". Apart from that it's just a standard depth-first post-order execution. Using Tarjan did turn out to be useful in detecting the strongly connected components when it came to top-level await though since top-level await and cycles actually turned out to need all this information in managing completion handling.

@evanw
Copy link
Owner
evanw commented Nov 28, 2020

I'm in the middle of working on another iteration of the code splitting feature and I just discovered what could be considered another ordering bug. Currently /* @__PURE__ */ comments are treated mostly the same as pure code (i.e. code having no side effects). This means they can end up being reordered within the same file in some code splitting situations.

However, there are definitely going to be cases where people annotate impure code with /* @__PURE__ */ and expect the impure code to continue to work. So esbuild should not consider these expressions to be pure.

I'll have to make this side effect boolean a three-state value:

  • Impure and cannot be reordered or removed
  • Pure and can be reordered or removed
  • "User marked as pure but not actually pure" meaning ok to remove but not reorder

Just documenting this here for later. Back to the drawing board.

Edit: Interesting. This could potentially lead to dead code when combined with code splitting because this means it's no longer valid to extract statements containing /* @__PURE__ */ calls out of the file they came from and inline them into another file. So all such statements in a given file will be included in all entry points if they are used by only a single entry point. Oh well. I don't see a way around this really unless you get really complicated with tracing ordering constraints across chunks but that's going too far.

Edit 2: Actually Terser does sometimes reorder function calls1 marked as /* @__PURE__ */ so maybe I can just keep treating it as having no side effects and avoid the extra complexity of order preservation. Although I bet Terser doesn't reorder whole statements marked as /* @__PURE__ */ like esbuild may be doing. One example of a problem this could cause is some function calls marked as /* @__PURE__ */ that add things to a registry on startup being reordered depending on the code splitting boundary. Arguably that could be considered a problem with the code instead of with esbuild since calls to real pure functions are fine to reorder. At least there's --tree-shaking=ignore-annotations to disable interpretation of /* @__PURE__ */ comments.

1 For example, this code will be minified to b()(a() ? c : d) by both Terser and esbuild, which means a() and b() are reordered:

if (/* @__PURE__ */ a()) return (/* @__PURE__ */ b())(c)
else return (/* @__PURE__ */ b())(d)

@evanw
Copy link
Owner
evanw commented Nov 29, 2020

I just discovered that I think Rollup also has similar ordering bugs with code splitting. Here is an example (link):

// main.js
import './lib2'
import './lib1'
if (window.lib !== 'lib1') throw 'fail'
// main2.js
import './lib1'
import './lib2'
if (window.lib !== 'lib2') throw 'fail'
// lib1.js
window.lib = 'lib1'
// lib2.js
window.lib = 'lib2'

With this input Rollup generates this code, which incorrectly causes the second entry point to throw:

// output/main.js
import './lib1-4ba6e17e.js';
if (window.lib !== 'lib1') throw 'fail'
// output/main2.js
import './lib1-4ba6e17e.js';
if (window.lib !== 'lib2') throw 'fail'
// output/lib1-4ba6e17e.js
window.lib = 'lib2';
window.lib = 'lib1';

I wonder if this is a known bug with Rollup or not. A quick search of existing issues didn't uncover anything.

My plan for handling this in esbuild is to make code in non-entry point chunks lazily-evaluated (wrapped in callbacks) and then have the entry points trigger the lazy evaluation (call the callbacks) in the right order. This has the additional benefit of avoiding conflating the binding and evaluation phases in non-esm output formats. If evaluation is deferred, then all binding among chunks happens first followed by all evaluation afterward.

Another possible approach could be to generate many more output chunks for each unique atomic piece of code. But that seems like it'd be undesirable because it would generate too many output chunks. I would also still have to deal with the binding/evaluation issues for non-esm output formats if I went with that approach.

@evanw
Copy link
Owner
evanw commented Feb 20, 2021

I'm deep in the middle of working on this at the moment. I just discovered an issue with my approach. It's a problem similar to the one I discovered in #465: external imports (imports to code that is not included in the bundle) must not be hoisted past internal code or the ordering of the code changes, which is invalid. The problem is that this requires the generation of extra chunks instead of just generating a single chunk of code.

Consider this input file:

import {a} from './some-internal-file.js';
import {b} from '@external/package';
console.log(a, b);

It would be invalid to transform that to something like this:

import {b} from '@external/package';
let a = /* ... code from ./some-internal-file.js ... */;
console.log(a, b);

That reorders the side effects of the internal file past the side effects of the package. Instead you have to do something like this for correctness:

import {a} from './chunk.HSYX7.js';
import {b} from '@external/package';
console.log(a, b);

where ./chunk.HSYX7.js is an automatically-generated chunk containing all of the code from before the external import. The external import means it is impossible to bundle this code into a single file if you care about import order. I don't think you can replace this with await import() to get around this because there could be multiple external imports involved in a cycle that need to be able to reference each other's exports.

I have been just doing what other bundlers do so I haven't considered this case before. But other bundlers such as Rollup appear to get this wrong too. This also gets massively more complicated with top-level await and different parallel execution graphs. Not sure what to do about this new level of complexity yet.

@rneogy
Copy link
rneogy commented Dec 12, 2022

@evanw I just wanted to check in regarding the status of this issue. It's been a while since we've heard any updates on it, and I'm curious if there are any plans to address this in the near future. As far as I can tell, it effectively prevents a lot of use-cases of splitting, especially when dealing with larger projects with many co-dependent imports. It also is worrying because it's hard to know while you continue to develop on a project with splitting enabled, whether a new import will break the build.

I'm also curious about workarounds in the meantime. I found this reply as the most recent method to deal with potential issues with import order, but I'm not actually quite sure why it works. Could you expand on that, or provide a more general workaround for dealing with the ordering bug?

As always, thank you so much for continuing to work on this project.

@tmcconechy
Copy link

noticed on around v0.17.7 the behavior seems to have changed and some of the imports orders seem to work now when using code splitting. Is this issue fixed/partly fixed somehow?

@iamakulov
Copy link
Author
iamakulov commented Mar 7, 2023

@tmcconechy Not sure about other cases, but

  • the reproduction above still throws the error
  • and we still can reproduce this issue in the Framer codebase

However, there’s actually a workaround for this issue (which we’re using at Framer as well). If you convince esbuild to code-split the code that lives in entrypoints, you’ll get a chance to order things correctly.

Like, let’s take the repro above. Right now, it throws an error. But if you add one more file:

// entrypoints-workaround.js
import "./init-dep-1.js";
import "./init-dep-2.js";

and make that file the first entrypoint (first is important):

$ esbuild --bundle --outdir=build --format=esm --splitting --platform=node --out-extension:.js=.mjs ./entrypoints-workaround.js ./entry1.js ./entry2.js

then esbuild will move init-dep-1.js and init-dep-2.js into separate chunks and import these chunks before the run-dep.js chunk. As a result, the bundle will start working as expected:

$ node ./build/entry1.mjs
foo.log() (from entry 1) called

Note that you don’t actually need to load the produced build/_entrypoints-workaround.mjs – it’s only needed during the build, to convince esbuild that init-dep-1.js and init-dep-2.js are used more than in one entrypoint.

@damienmortini
Copy link
damienmortini commented Mar 7, 2023

I used inject with the problematic files as a temporary workaround, not sure if it's the best way, but it's simple enough and work for my use case.

@tmcconechy
Copy link

@damienmortini whats "inject"? I do like @iamakulov workaround but for me we have hundreds of endpoints so tricky to get that right.

@aleclarson
Copy link

Slightly related: There should exist an option that disables import "./chunk-XYZXYZ.js" statements from being generated entirely if deterministic execution order isn't required. When a package is using sideEffects in its package.json, these unwanted imports cause Esbuild to warn the user when the package is being bundled.

Example warning:

▲ [WARNING] Ignoring this import because "../alloc/alien-dom/dist/chunk-UIHG724I.mjs" was marked as having no side effects [ignored-bare-import]

    ../alloc/alien-dom/dist/index.mjs:54:7:
      54 │ import "./chunk-UIHG724I.mjs";
         ╵        ~~~~~~~~~~~~~~~~~~~~~~

  It was excluded from the "sideEffects" array in
  the enclosing "package.json" file:

    ../alloc/alien-dom/package.json:17:2:
      17 │   "sideEffects": [
         ╵   ~~~~~~~~~~~~~

@bogdan-panteleev
Copy link
bogdan-panteleev commented Aug 2, 2023

Also faced with a similar problem.

I have a code:

// main.ts
import dotenv from 'dotenv';
dotenv.config();
console.log('configured', process.env.DOTENV_RESULT);

import { ENDPOINTS } from '../contants/endpoints';

console.log('check import', ENDPOINTS.USERS); // to escape tree shaking

// endpoints.ts
console.log('inside dependency');
export const ENDPOINTS = Object.freeze({
  USERS: `/users`,
  POSTS: `/posts`,
});

I build these files via command:
esbuild main.ts --bundle --format=esm --splitting --outdir=dist --platform=node --sourcemap.

I also tried esbuild main.ts --bundle --outdir=dist --platform=node --sourcemap. The result is the same, despite that resulting bundles are a little bit different.

The resulting log is:

  1. inside dependency
  2. configured
  3. check import

So the problem is that I do not have dotenv configured inside dependency, because dependency is initialized before. This is applicable to any other scenario with dependencies.

@maelp
Copy link
maelp commented Dec 8, 2023

I'm arriving at this issue also because of import order bugs, so what is the take on this issue?

  • is it that ESM does not specify import order, and we have to deal with it ourselves (eg assuming all imports are async somehow?)
  • or is that something where there is work in progress to fix the issue?

@Murtatrxx
Copy link

What is the current status and is there an ETA for a fix?

@richarddesan
Copy link

Facing the same problem, we are postponing the use of esbuild in our monorepo for the moment until there is a solution for this

@Valgrifer
Copy link

Put a comment /* @__FIXED__ */ or something else, it was not a solution to indicate to esbuild to place this code before the other import?

@zhiyan114
Copy link

Looking for a solution as well!

The latest version of sentry.io SDK requires init to be called before the rest of the code and currently esbuild is still treating

import sentryStuff;
init(...);
import "./index.js"

as

import sentryStuff;
import "./index.js"
init(...);

Which breaks sentry's auto-instrument

@pfdgithub
Copy link

Looking for a solution as well!也在寻找解决方案!

The latest version of sentry.io SDK requires init to be called before the rest of the code and currently esbuild is still treating最新版本的 sentry.io SDK 要求在其余代码之前调用 init,目前 esbuild 仍在处理

import sentryStuff;
init(...);
import "./index.js"

as 如

import sentryStuff;
import "./index.js"
init(...);

Which breaks sentry's auto-instrument这破坏了哨兵的自动仪表

you can import and initialize sentry in html file

@zhiyan114
Copy link

Looking for a solution as well!也在寻找解决方案!
The latest version of sentry.io SDK requires init to be called before the rest of the code and currently esbuild is still treating最新版本的 sentry.io SDK 要求在其余代码之前调用 init,目前 esbuild 仍在处理

import sentryStuff;
init(...);
import "./index.js"

as 如

import sentryStuff;
import "./index.js"
init(...);

Which breaks sentry's auto-instrument这破坏了哨兵的自动仪表

you can import and initialize sentry in html file

I'm running the code in a backend environment (NodeJS). For the workaround, I just replace import with require, but I still like it to be import, mostly so that the transpiler can "optimize" the import.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests