Better Drive files download failure #1482

Conchylicultor · 2020-02-19T19:49:32Z

Download of drive urls sometimes fails with NonMatchingChecksumError: Artifact https://drive.google.com/... has wrong checksum.

Explanation: Drive sometimes reject the download attempt, and the rejection page is downloaded instead of the data:

If the user is based in china (should use VPN)
If there is too many downloads of the same file.

The best solution currently is to manually download the data (https://www.tensorflow.org/datasets/overview#manual_download_if_download_fails), rather than using the automated download which got rejected by drive.

Otherwise:

Try the download latter on.
Try on a different computer
Rather than downloading the file in each colab connection, load the dataset from a GCS bucket. See instructions.

Not sure there can be a solution on Google Drive side, while preventing abuse.
On TFDS side, we could make the error message more explicit when we detect a drive URL.

The text was updated successfully, but these errors were encountered:

dhirensr · 2020-02-20T23:39:15Z

@Conchylicultor : do we need to just change the error message in this ticket? could you guide me a little bit so that I could work on it.

jpgard · 2020-03-04T01:34:46Z

Is there a way for users to e.g. make a copy of the files into our own Google Drive for a dataset, manually download them to the correct location, and proceed from there? Or any other manual workaround using the publicly-available celeba data?

ChanchalKumarMaji · 2020-03-20T16:01:45Z

For drive links downloads can be done by extracting the id and creating the download link as -
https://drive.google.com/uc?id=0B7EVK8r0v71pZjFTYXZWM3FlRnM

For now celeb_a download link shows -

Sorry, you can't view or download this file at this time.

Too many users have viewed or downloaded this file recently. Please try accessing the file again later. If the file you are trying to access is particularly large or is shared with many people, it may take up to 24 hours to be able to view or download the file. If you still can't access a file after 24 hours, contact your domain administrator.

liqinglin54951 · 2021-01-28T19:16:45Z

celeb_a tfrecord files:
https://drive.google.com/drive/folders/1MKQ9sRwr5OOFk3OBzLz91SsgF3MBqvtP?usp=sharing
OR
you can follow " Create tfrecord files for 'test', 'train', 'validation' " on cp13_Parallelizing NN Training w TF_printoptions(precision)_squeeze_shuffle_batch_repeat_image process_map_tfrecords (https://blog.csdn.net/Linli522362242/article/details/112386820)

ghost · 2021-04-24T14:10:44Z

celeb_a tfrecord files:
https://drive.google.com/drive/folders/1MKQ9sRwr5OOFk3OBzLz91SsgF3MBqvtP?usp=sharing
OR
you can follow " Create tfrecord files for 'test', 'train', 'validation' " on cp13_Parallelizing NN Training w TF_printoptions(precision)_squeeze_shuffle_batch_repeat_image process_map_tfrecords (https://blog.csdn.net/Linli522362242/article/details/112386820)

Is there an easier way to install the celeb_a dataset? I am trying the "download" manually method, but it is not helping at all.

johnny-brav0 · 2021-07-09T12:12:30Z

Has anyone tried executing the code cell twice?
I'm getting the same error for the "paws_wiki" dataset, using tfds.load('paws_wiki'). But apart from the error, the data does gets downloaded and as soon as I execute the cell again it works and imports the data to my environment.

aryan-f · 2022-01-01T19:16:21Z

Looking for a workaround for this issue, I ended up finding a routine in the library that checks for the files on your own machine before attempting to download them. It was ~/tensorflow_datasets/downloads/manual (the download_dir in download_and_prepare). You can manually download the file from Drive and place it in one of the directories it checks. Once you log in, high traffic is no longer an issue.

The database I intended to use was the CaltechBirds2010 and I found the Drive link here.

Abdulrasheedar · 2024-06-10T13:27:57Z

I get this below error while I was trying to use deep_weeds dataset with this code " data_train, info = tfds.load("deep_weeds", with_info=True, split='train[:60%]',as_supervised=True) "

NonMatchingChecksumError Traceback (most recent call last)
in <cell line: 1>()
----> 1 data_train, info = tfds.load("deep_weeds", with_info=True, split='train[:60%]',as_supervised=True)
2 data_valid = tfds.load("deep_weeds",split='train[60%:80%]',as_supervised=True)
3 data_test = tfds.load("deep_weeds", split='train[80%:]',as_supervised=True)
4 # file_path ="/content/images/"
5 # dataset = tfds.load(name='deep_weeds', data_dir=file_path)

20 frames
/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/download/download_manager.py in _validate_checksums(url, path, computed_url_info, expected_url_info, force_checksums_validation)
769 'https://www.tensorflow.org/datasets/overview#fixing_nonmatchingchecksumerror'
770 )
--> 771 raise NonMatchingChecksumError(msg)
772
773

NonMatchingChecksumError: Artifact https://drive.google.com/uc?export=download&id=1xnK3B6K6KekDI55vwJ0vnc2IGoDga9cj, downloaded to /root/tensorflow_datasets/downloads/ucexport_download_id_1xnK3B6K6KekDI55vwJ0vnc2ITDlCjLc2rcwnx4HX2m4DkEyLfA722UJqaLRkfNhB6ec.tmp.68dd982dd0fd4809b12f3ef885ebe32f/download, has wrong checksum:

Expected: UrlInfo(size=468.75 MiB, checksum='0961f63c01b997bfab1559ad09e99c0e8130617fd96a8b92fdc09940e01b0ce8', filename='images.zip')
Got: UrlInfo(size=2.37 KiB, checksum='45872f70daf5443d63f16ae988757788b8b585f678bf94987c6fe0487086aec5', filename='download')
To debug, see: https://www.tensorflow.org/datasets/overview#fixing_nonmatchingchecksumerror

Conchylicultor added the enhancement New feature or request label Feb 19, 2020

Conchylicultor mentioned this issue Feb 19, 2020

Download_and_prepare the300w_lp tensorflow #1446

Closed

This was referenced Feb 25, 2020

Error in loading the celeb_a dataset (Py 3.7) #127

Closed

Can't download celeba #965

Closed

Conchylicultor mentioned this issue Mar 7, 2020

celeb_a has wrong checksum #1598

Closed

mikechen66 referenced this issue in henrypowell87/AlexNet_TF2.0 Apr 28, 2020

Comments and post training visualization

c1b2a69

Vinnitsky mentioned this issue May 12, 2020

cifar10_1 error: wrong checksum #2016

Closed

Tianyu00 mentioned this issue May 19, 2020

can not download tensorflow datasets rasbt/python-machine-learning-book-3rd-edition#131

Open

mikechen66 mentioned this issue May 21, 2020

raise NonMatchingChecksumError(resource.url, tmp_path) henrypowell87/AlexNet_TF2.0#1

Open

vijayphoenix mentioned this issue Sep 8, 2020

tfds.load('celeb_a') throws errors after few seconds downloading #2401

Closed

Conchylicultor mentioned this issue Sep 21, 2020

CelebA: Resource cannot infer ExtractMethod from filename #2321

Closed

shahaf-m mentioned this issue Sep 21, 2020

Problem with checksum value for imagenet_v2 dataset #2461

Closed

Conchylicultor mentioned this issue Nov 30, 2020

get KeyError while loading celeb_a dataset with tfds(tensorflow_datasets) #2801

Closed

ghost mentioned this issue Apr 20, 2021

NonMatchingChecksumError for celeb_a #3150

Closed

Reozil mentioned this issue May 11, 2021

citrus_leaves, dataset not working #3218

Open

dipendra009 mentioned this issue May 4, 2022

checksum error for downloaded data google-research/pix2seq#7

Closed

forrestbao mentioned this issue May 15, 2022

downloading got Google Drive virus scan warning page rather than data files #3935

Open

cleong110 mentioned this issue Jan 4, 2024

Jehovah Witness Sign Language Resources sign-language-processing/datasets#29

Open

tilakrayal mentioned this issue Feb 28, 2024

NonMatchingChecksumError while downloading Caltech Birds 2011 dataset tensorflow/tensorflow#62805

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better Drive files download failure #1482

Better Drive files download failure #1482