TensorFlow binary crashes on Apple M1 in x86_64 Docker container #52845

dwyatte · 2021-10-28T23:01:20Z

Please make sure that this is a bug. As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:bug_template

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
TensorFlow installed from (source or binary): Binary
TensorFlow version (use command below): TensorFlow 2.6.0, tf-nightly 2.8.0.dev20211028
Python version: 3.6.9, 3.7.x, 3.8.x
CUDA/cuDNN version: N/A
GPU model and memory: N/A

Describe the current behavior

dwyatte-macbookpro:~ dwyatte$ docker run tensorflow/tensorflow:latest python -c "import tensorflow as tf"    
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
2021-10-28 22:50:41.481158: F tensorflow/core/lib/monitoring/sampler.cc:42] Check failed: bucket_limits_[i] > bucket_limits_[i - 1] (0 vs. 10)
qemu: uncaught target signal 6 (Aborted) - core dumped

Describe the expected behavior
Clean exit

Standalone code to reproduce the issue
Requires an Apple M1 (arm64) host OS:
docker run tensorflow/tensorflow:latest python -c "import tensorflow as tf"

This was previously mentioned in #42387 but unfortunately closed. When importing TensorFlow in an x86_64 docker container on an Apple M1, TensorFlow crashes. As far as I can tell, this should work as I can import and use other Python packages in the same container without problems (including things like numpy).

It's unclear whether this is something that can be avoided at the TensorFlow level or an unavoidable bug in qemu ([1], [2]), but I wanted to reraise the issue.

The text was updated successfully, but these errors were encountered:

mohantym · 2021-10-29T08:29:49Z

Hi @dwyatte ! Could you check these threads ? link1,link2

dwyatte · 2021-10-29T15:13:42Z

Thanks @mohantym

The links just reference the warning above which I believe is innocuous since Docker can emulate the image's platform. TensorFlow doesn't publish official linux/arm64/v8 images (would require an aarch64 TensorFlow build), but I would think that would remove the warning. Note that the problem is specifically with TensorFlow's assumptions about the emulated platform and not the image or other libraries, which run fine when emulating linux/amd64:

dwyatte-macbookpro:~ dwyatte$ docker run tensorflow/tensorflow:latest python -c "import numpy as np; print(np.random.rand(10))"   
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
[0.86125896 0.40657583 0.76832123 0.77205272 0.99326573 0.513298
 0.64218547 0.15977918 0.37553315 0.56692333]

I suspect Check failed: bucket_limits_[i] > bucket_limits_[i - 1] (0 vs. 10) is a sanity check that TensorFlow runs on startup that fails under emulation. IMO this issue is about whether there is anything that can be done on the TensorFlow side to relax or correct this check or whether this is a critical check that is violated e.g., by qemu (https://gitlab.com/qemu-project/qemu/-/issues/601 suggests it could be floating point inaccuracy, although that seems to just be a guess).

mohantym · 2021-11-02T08:53:17Z

Hi @sanatmpa1! Could you please look at this issue?

vazkir · 2021-11-02T21:27:39Z

I am taking a class where we use tensorflow inside docker containers and everybody with an M1 mac in that class had this exact same issue including me. Unfortunately nobody has found a fix so I am going to subsribe to this issue as well, I hope there exist some kind of workarround/solution!

alexcombessie · 2021-11-05T13:35:18Z

Hi,

I have the exact same issue. It is hindering my development process. While my app is deployed on an x86 server, I do need to use my M1 mac with emulation to develop code locally and to push it to production.

All other major data science packages work correctly under x86 rosetta emulation: pandas, scikit-learn, torch, transformers, spacy, xgboost, lightgbm.

I appreciate the great work you are doing with TensorFlow. I would be really grateful if you could take the time to help the data scientists / ML engineers out there who are using ARM-based development laptops.

Thanks a lot,

Alex

PS: I am not interested in forks like tensorflow-macos etc as I need my work to be cross-platform.

bhack · 2021-11-07T11:37:39Z

apple/tensorflow_macos#164 (comment)

https://github.com/ARM-software/Tool-Solutions/tree/master/docker/tensorflow-aarch64

But as someone still needs to use this in emulation I suppose in that It could be a qemu BUG with DBL_MAX in emulation

vazkir · 2021-11-14T22:15:57Z

Did anybody find any way to run tensorflow inside a docker container on any M1, M1 Pro or M1 Max device? Would really love to know any workaround so I can start building containers with tf. Thanks in advance for any tips!

bhack · 2021-11-14T23:14:35Z

If the point is to have a published X86 wheel without AVX we have already an open ticket, so it is better to add a comment there instead of having a new ticket:

#19584

If instead you want to have AVX TCG support in QEMU e.g. on M1 there is already an open ticket at:
https://gitlab.com/qemu-project/qemu/-/issues/164

dwyatte · 2021-11-15T23:55:45Z

So I do think this is due to AVX instructions. If I install an unofficial wheel (e.g., from yaroslavvb/tensorflow-community-wheels#198) and run a variant of the docker run command above, I do not get a crash on import.

dwyatte-macbookpro:~ dwyatte$ docker run -it tensorflow/tensorflow:latest bash -c 'pip uninstall -y tensorflow-cpu && pip install -U https://tf.novaal.de/barcelona/tensorflow-2.6.0-cp38-cp38-linux_x86_64.whl && python -c "import tensorflow as tf; tf.print(\"hello world\")"'
...
2021-11-15 23:44:35.660302: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
hello world

Thanks for the lead @bhack. I agree, some solutions which you mention are:
1.) Publishing non-AVX wheels (or having non-AVX code paths available within a single wheel)
2.) Correctly handling in qemu via emulation/TCG/etc.

bhack · 2021-11-16T01:43:36Z

For the first point I don't know if anyone at @Intel-tensorflow is interested to publish an SSE4.x only wheel in https://pypi.org/project/intel-tensorflow/

gabac · 2021-11-18T06:26:01Z

@dwyatte Thanks a lot for the tip. With an unofficial wheel I was able to get Tensorflow running within Docker on an Apple M1 processor 🚀

janvdp · 2021-11-19T08:32:37Z

@gabac One you built or one that is available online? I'm facing the same issue...

gabac · 2021-11-19T09:22:34Z

E.g. if you use pip as a package manager use e.g. pip install -U https://tf.novaal.de/barcelona/tensorflow-2.5.0-cp37-cp37m-linux_x86_64.whl for Python 3.7, Tensorflow 2.5.0

janvdp · 2021-11-19T19:10:09Z

Thanks, that did the trick! Unfortunately, Docker + M1 Mac seems to be pretty slow... :( (not talking about training...)

bhack · 2021-11-19T22:34:44Z

For performance you need to use tensorflow-macos

sanschaise · 2022-09-11T19:00:34Z

No update on this?

josemiguelalves · 2022-09-14T12:57:52Z

any update?

harraz · 2022-10-07T14:51:34Z

I've tried Tensorflow 2.3.1 and I still get F tensorflow/core/lib/monitoring/sampler.cc:42] Check failed: bucket_limits_[i] > bucket_limits_[i - 1] (0 vs. 10) qemu: uncaught target signal 6 (Aborted) - core dumped Any suggestions would be great - thanks.

Any luck with this issue. I get this when i try to import tensorflow in python

dwyatte · 2022-10-08T15:01:04Z

While this issue was originally opened around emulating TensorFlow on x86_64 in Docker, it does look like there are now tensorflow aarch64 binaries that can be used in linux/arm64/v8 Docker containers. More info here: https://blog.tensorflow.org/2022/09/announcing-tensorflow-official-build-collaborators.html

Dockerfile

FROM python:3.7-slim

RUN pip install tensorflow==2.10.0 tensorflow-io==0.27.0
CMD python -c "import tensorflow as tf; print(tf.constant(42) / 2 + 2)"

docker build --platform=linux/arm64/v8 . -t tensorflow
docker run --platform=linux/arm64/v8 tensorflow

tf.Tensor(23.0, shape=(), dtype=float64)

sachinprasadhs · 2022-11-30T19:04:49Z

@dwyatte , Thanks for confirming, if your issue is resolved, could you please close this issue.
Also, refer https://www.tensorflow.org/install for latest install instructions. Thanks!

dwyatte · 2022-12-04T21:16:03Z

@dwyatte , Thanks for confirming, if your issue is resolved, could you please close this issue.

Sure, I think we can close this now. QEMU also appears to have merged AVX instructions, so once that is pulled into Docker, it might also be possible to run via emulation.

https://gitlab.com/qemu-project/qemu/-/issues/164#note_1140802183

google-ml-butler · 2022-12-04T21:16:06Z

Are you satisfied with the resolution of your issue?
Yes
No

fumoboy007 · 2022-12-05T05:03:00Z

@sachinprasadhs Will Google release prebuilt ARM64 Docker images to Docker Hub? I’m especially interested in an ARM64 tensorflow/serving image.

sachinprasadhs · 2022-12-07T22:06:54Z

CC:@angerson , @learning-to-play

learning-to-play · 2022-12-07T23:19:33Z

Thanks for reaching out! I'm not aware of any plans to release prebuilt ARM64 Docker images.

fumoboy007 · 2022-12-08T03:51:51Z

@learning-to-play It would be great for the community if we had prebuilt images for all architectures that we support. 🙏

* Bump ujson from 1.35 to 5.4.0 Bumps [ujson](https://github.com/ultrajson/ultrajson) from 1.35 to 5.4.0. - [Release notes](https://github.com/ultrajson/ultrajson/releases) - [Commits](ultrajson/ultrajson@v1.35...5.4.0) --- updated-dependencies: - dependency-name: ujson dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Meedan 2116 update image scoring (#244) * CHECK-2116 update alegre image endpoint to return correctly ordered scoring, add init perl to start.sh file while we're here * CHECK-2116 update alegre image endpoint to return correctly ordered scoring, add init perl to start.sh file while we're here * CHECK-2116 fix typo * CHECK-2116 update test for new scoring setup * CHECK-2116 update contract test * Meedan 2120 add limits (#245) * CHECK-2120 initial push on adding limit to all search responses * CHECK-2120 fix typo * CHECK-2120 remove bad id after testing context hashes on dev * CHECK-2120 update variable name * CHECK-2120 update test * CHECK-2120 fix typo * CHECK-2120 refactor audio similarity to make search function less complex * CHECK-2120 fix more minor code climate issue * Change Alegre port to 3100 to avoid conflict on Mac Monterey (#246) Port 5000, which Alegre currently runs on, is now used by AirPlay on Macs running Monterey. As a result, there is an error that port is in use when our application tries to use that port in development. To fix this, I modified the external port to 3100, which it seems to have been at some point in the past (reflected by Readme). For internal consistency, I went ahead and updated the internal port to 5000, as well, even though it wasn't really necessary. Fixes CHECK-2147 * Meedan 2178 delete with context (#247) * CHECK-2120 initial push on adding limit to all search responses * CHECK-2120 fix typo * CHECK-2120 remove bad id after testing context hashes on dev * CHECK-2120 update variable name * CHECK-2120 update test * CHECK-2120 fix typo * CHECK-2120 refactor audio similarity to make search function less complex * CHECK-2120 fix more minor code climate issue * CHECK-2178 add deletion conditional on context uniqueness * CHECK-2178 fix code climate issues * CHECK-2178 remove context on text until we are able to do something with it in next ticket * add type checking * and of course we want is list * CHECK-2178 add prints to diagnose these last bugs * CHECK-2178 work on type mismatch now * CHECK-2178 fix tests with updated input data * CHECK-2178 fix typo in function params and update tests to reflect added context * CHECK-2178 add context to test * CHECK-2178 remove prints * CHECK-2139 add parameters to establish min cutoff score from ES as we… (#250) * CHECK-2139 add parameters to establish min cutoff score from ES as well as per-model thresholding * CHECK-2139 resolve codeclimate suggestion * Use community version of Tensorflow that works with M1 The TensorFlow binary downloaded from a normal TensorFlow 2.3.1 pip install (from requirements) was crashing when we used the linux/x86_64 emulated arch with M1 macs (which is needed because TensorFlow does not yet have an arm-supported version). To solve this, we are using a community wheel of Tensorflow 2.3.1 compiled as we need it. More on this here: tensorflow/tensorflow#52845 Paired with Ahmed! CHECK-2147 * Fixes creating text graphs When I was trying to generate text clusters locally, it didn’t fail, but no clusters were returned. It worked well for images. Looks like some changes to text similarity were not reflects in the graph writer. Looks like "model" should now be "models" and "text" should be "content". I'm not sure, so I'll ask Devin to review it. Fixes CHECK-2212. * CHECK-2179 initial push on using context in text like other media (#249) * CHECK-2179 initial push on using context in text like other media * CHECK-2179 alter logic of delete to allow to attempt to delete any not-multi-context doc * CHECK-2179 re-add missing var * CHECK-2131 add errbit notification for broken search result (#253) * CHECK-2131 add errbit notification for broken search result * CHECK-2131 remove now irrelevant test * CHECK-2131 old test is changed due to minor change from API - fix maybe? * CHECK-2131 make test more robust * CHECK-2131 switch args * CHECK-2131 More test fixes * CHECK-2131 this set of tests man! * CHECK-2131 more fixing on these tests * CHECK-2387 don't allow nil thresholds (#255) * CHECK-2387 don't allow nil thresholds * CHECK-2387 ah the old zero is not game in python * CHECK-2284 update documentation to more explicitly call out that swagger docs wont work out of box (#257) * CHECK-2284 update documentation to more explicitly call out that swagger docs wont work out of box * MEEDAN-2284 fix whitespace * CHECK-2437 add support for using analyzers by language (#258) * CHECK-2437 add support for using analyzers by language * CHECK-2437 remove old dependencies from half-implementation of analyzers * CHECK-2437 shift es client * CHECK-2437 add tests for new use case * CHECK-2437 add fix for tests to actually pass * Meedan 2437 multiple analyzer indices (#261) * CHECK-2437 add support for using analyzers by language * CHECK-2437 remove old dependencies from half-implementation of analyzers * CHECK-2437 shift es client * CHECK-2437 add tests for new use case * CHECK-2437 add fix for tests to actually pass * CHECK-2437 resolve code review fixes * Optionally allow language override * CHECK-2437 add ascii folding and other minor tweaks (#262) * Change order of analyzer filters * remove draft lines * CHECK-1716 Add explicit model returns for all responses, also sneak in some language analyzer changes (#264) * CHECK-1716 Add explicit model returns for all responses, also sneak in some language analyzer changes * CHECK-1716 add updates to test fixtures * CHECK-1716 add more test fixes * CHECK-2608 version bump cld (#265) * CHECK-2608 add test function (#266) * Fixing PostgreSQL Dockerfile All CI builds were failing with this error: ``` W: The repository 'http://apt.postgresql.org/pub/repos/apt stretch-pgdg Release' does not have a Release file. E: Failed to fetch http://apt.postgresql.org/pub/repos/apt/dists/stretch-pgdg/11/binary-amd64/Packages 404 Not Found [IP: 147.75.85.69 80] E: Some index files failed to download. They have been ignored, or old ones used instead. The command '/bin/sh -c apt-get update && apt-get install -y gawk postgresql-plperl-$PG_MAJOR && localedef -i ru_RU -c -f UTF-8 -A /usr/share/locale/locale.alias ru_RU.UTF-8 && rm -rf /var/lib/apt/lists/*' returned a non-zero code: 100 Service 'postgres' failed to build : Build failed ``` Here's an announcement: https://www.postgresql.org/message-id/Y2kmqL%2BpCuSZiQBV%40msg.df7cb.de Fixed by installing the packages from the archive repository. * CHECK-2690 remove vectors from responses for alegre text (#268) * Meedan 2690 remove vectors from response (#269) * CHECK-2690 remove vectors from responses for alegre text * CHECK-2690 apply stripper to every case * CHECK-2690 minor fix * CHECK-2702 fix thresholding function for audio (#270) * CHECK-2702 fix thresholding function for audio * CHECK-2702 fix tests * invert index * CHECK-2782 update matching to reject mismatched lengths (#273) * Bump pyjwt from 1.6.4 to 2.4.0 (#236) Bumps [pyjwt](https://github.com/jpadilla/pyjwt) from 1.6.4 to 2.4.0. - [Release notes](https://github.com/jpadilla/pyjwt/releases) - [Changelog](https://github.com/jpadilla/pyjwt/blob/master/CHANGELOG.rst) - [Commits](jpadilla/pyjwt@1.6.4...2.4.0) --- updated-dependencies: - dependency-name: pyjwt dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump joblib from 1.0.1 to 1.2.0 (#260) Bumps [joblib](https://github.com/joblib/joblib) from 1.0.1 to 1.2.0. - [Release notes](https://github.com/joblib/joblib/releases) - [Changelog](https://github.com/joblib/joblib/blob/master/CHANGES.rst) - [Commits](joblib/joblib@1.0.1...1.2.0) --- updated-dependencies: - dependency-name: joblib dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump certifi from 2018.10.15 to 2022.12.7 (#272) Bumps [certifi](https://github.com/certifi/python-certifi) from 2018.10.15 to 2022.12.7. - [Release notes](https://github.com/certifi/python-certifi/releases) - [Commits](certifi/python-certifi@2018.10.15...2022.12.07) --- updated-dependencies: - dependency-name: certifi dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump mako from 1.0.7 to 1.2.2 (#256) Bumps [mako](https://github.com/sqlalchemy/mako) from 1.0.7 to 1.2.2. - [Release notes](https://github.com/sqlalchemy/mako/releases) - [Changelog](https://github.com/sqlalchemy/mako/blob/main/CHANGES) - [Commits](https://github.com/sqlalchemy/mako/commits) --- updated-dependencies: - dependency-name: mako dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump protobuf from 3.9.2 to 3.18.3 (#259) Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 3.9.2 to 3.18.3. - [Release notes](https://github.com/protocolbuffers/protobuf/releases) - [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/generate_changelog.py) - [Commits](protocolbuffers/protobuf@v3.9.2...v3.18.3) --- updated-dependencies: - dependency-name: protobuf dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update article.py * Update bulk_similarity_controller.py * Update bulk_similarity_controller.py Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Devin Gaffney <itsme@devingaffney.com> Co-authored-by: Christa Hartsock <christa.hartsock@gmail.com> Co-authored-by: Christa Hartsock <christa@meedan.com> Co-authored-by: Caio Almeida <caiosba@gmail.com>

dwyatte added the type:bug Bug label Oct 28, 2021

google-ml-butler bot assigned mohantym Oct 28, 2021

mohantym added type:build/install Build and install issues subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues 2.6.0 labels Oct 29, 2021

mohantym added the stat:awaiting response Status - Awaiting response from author label Oct 29, 2021

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Oct 31, 2021

mohantym assigned sanatmpa1 and unassigned mohantym Nov 2, 2021

hrhee mentioned this issue Nov 4, 2021

docker Ivo-B/CC-DL-template-example#2

Open

18 tasks

bhack mentioned this issue Nov 6, 2021

Cannot run TensorFlow 2.7 in Docker on M1 (Apple Silicon) #52972

Closed

sanatmpa1 assigned sachinprasadhs and unassigned sanatmpa1 Nov 11, 2021

sachinprasadhs assigned aaudiber Nov 11, 2021

sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Nov 11, 2021

LukasWallrich mentioned this issue Oct 3, 2022

Support for Mac ARM-64 kermitt2/grobid#937

Closed

This was referenced Oct 26, 2022

"Segmentation fault (core dumped)" Error for step "sc.pp.neighbors" scverse/scanpy#2361

Closed

"sc.pp.neighbors" kills kernel scverse/scanpy#2359

Closed

sachinprasadhs added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Nov 30, 2022

pwighton mentioned this issue Dec 2, 2022

Error running mri_synthseg from a docker freesurfer/freesurfer#1036

Closed

dwyatte closed this as completed Dec 4, 2022

fumoboy007 mentioned this issue Dec 6, 2022

tensorflow-serving docker container doesn't work on Macs with Apple M1 chips. tensorflow/serving#1948

Open

SpecLad mentioned this issue Mar 22, 2023

Running serverless function yolov3 fails, installing automatic annotation cvat-ai/cvat#5819

Closed

lucasgelfond mentioned this issue May 1, 2023

Running tasks that require TensorFlow on Apple Silicon forensic-architecture/mtriage#185

Open

HuifengShrimp mentioned this issue May 29, 2023

Logistic Regression mpc-msri/EzPC#179

Closed

This was referenced Jul 27, 2023

software.json files created but no mentions (and no metadata)? howisonlab/screenit-softcite#6

Open

server crashes with "qemu: uncaught target signal 6" on Mac M1 silicon softcite/software-mentions#29

Open

jb08 mentioned this issue Oct 4, 2023

Support for M1 mac acil-bwh/ChestImagingPlatform#51

Open

jiyoung-an mentioned this issue Oct 12, 2023

[Bugfix] Update Docker image based on CPU or GPU. ainize-team/ainize-run-wonny-example#11

Merged

obriensystems mentioned this issue Nov 27, 2023

TensorFlow on Intel, NVidia and OSX platforms ObrienlabsDev/machine-learning#2

Open

jqmcginnis mentioned this issue Dec 6, 2023

LST_AI installation on MAC OS CompImg/LST-AI#3

Closed

vigneshsankariyer1234567890 mentioned this issue Mar 11, 2024

[Feature] Build M1 Mac ARM64 Images deezer/spleeter#717

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorFlow binary crashes on Apple M1 in x86_64 Docker container #52845

TensorFlow binary crashes on Apple M1 in x86_64 Docker container #52845

TensorFlow binary crashes on Apple M1 in x86_64 Docker container #52845

TensorFlow binary crashes on Apple M1 in x86_64 Docker container #52845

Comments