sc.read_h5ad randomly produces AnnDataReadError/OSError #1351

Hrovatin · 2020-08-01T11:38:59Z

I am trying to load some datasets with sc.read_h5ad(file_name). Frequently, I get the below error. When I re-run the code multiple times or at different times it sometimes works, but often I get the error (using the same code and data). This happens when reading different h5ad datasets (e.g. is not specific to one dataset). At all times there seems to be enough free RAM / similar amount of free RAM. This happens both when using jupyter-notebook and python without jn.

Error:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, *args, **kwargs)
    155         try:
--> 156             return func(elem, *args, **kwargs)
    157         except Exception as e:

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_io/h5ad.py in read_group(group)
    505     if "h5sparse_format" in group.attrs:  # Backwards compat
--> 506         return SparseDataset(group).to_memory()
    507 

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_core/sparse_dataset.py in to_memory(self)
    370         mtx = format_class(self.shape, dtype=self.dtype)
--> 371         mtx.data = self.group["data"][...]
    372         mtx.indices = self.group["indices"][...]

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/h5py/_hl/dataset.py in __getitem__(self, args)
    572         fspace = selection.id
--> 573         self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
    574 

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5d.pyx in h5py.h5d.DatasetID.read()

h5py/_proxy.pyx in h5py._proxy.dset_rw()

h5py/_proxy.pyx in h5py._proxy.H5PY_H5Dread()

OSError: Can't read data (file read failed: time = Sat Aug  1 13:27:54 2020
, filename = '/path.../filtered_gene_bc_matrices.h5ad', file descriptor = 47, errno = 5, error message = 'Input/output error', buf = 0x55ec782e9031, total read size = 7011, bytes this sub-read = 7011, bytes actually read = 18446744073709551615, offset = 0)

During handling of the above exception, another exception occurred:

AnnDataReadError                          Traceback (most recent call last)
<ipython-input-14-faac769583f8> in <module>
     17     #while True:
     18         #try:
---> 19             adatas.append(sc.read_h5ad(file))
     20             file_diffs.append('_'.join([file.split('/')[i] for i in diff_path_idx]))
     21             #break

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_io/h5ad.py in read_h5ad(filename, backed, as_sparse, as_sparse_fmt, chunk_size)
    411                 d[k] = read_dataframe(f[k])
    412             else:  # Base case
--> 413                 d[k] = read_attribute(f[k])
    414 
    415         d["raw"] = _read_raw(f, as_sparse, rdasp)

~/miniconda3/envs/rpy2_3/lib/python3.8/functools.py in wrapper(*args, **kw)
    873                             '1 positional argument')
    874 
--> 875         return dispatch(args[0].__class__)(*args, **kw)
    876 
    877     funcname = getattr(func, '__name__', 'singledispatch function')

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, *args, **kwargs)
    160             else:
    161                 parent = _get_parent(elem)
--> 162                 raise AnnDataReadError(
    163                     f"Above error raised while reading key {elem.name!r} of "
    164                     f"type {type(elem)} from {parent}."

AnnDataReadError: Above error raised while reading key '/X' of type <class 'h5py._hl.group.Group'> from /.

Versions:

scanpy==1.5.1 anndata==0.7.4 umap==0.4.6 numpy==1.18.5 scipy==1.4.1 pandas==1.0.5 scikit-learn==0.23.1 statsmodels==0.11.1 python-igraph==0.8.2 louvain==0.6.1 leidenalg==0.8.1

The text was updated successfully, but these errors were encountered:

flying-sheep · 2020-08-01T13:20:46Z

That problem occurs within h5py (we just wrap the underlying OSError) and isn’t a consequence of how scanpy uses h5py. The relevant part of the traceback is:

OSError: Can't read data (file read failed:
    time = Sat Aug  1 13:27:54 2020,
    filename = '/path.../filtered_gene_bc_matrices.h5ad',
    file descriptor = 47,
    errno = 5,
    error message = 'Input/output error',
    buf = 0x55ec782e9031,
    total read size = 7011,
    bytes this sub-read = 7011,
    bytes actually read = 18446744073709551615,
    offset = 0)

The reported filename looks weird: '/path.../filtered_gene_bc_matrices.h5ad'. Is that file on some network share or colab or so? Because that’d explain wonky I/O. 18 exabytes (18 quintillion bytes!) read seems really off too!

I assume self.group["data"][...] tries to read all the data for .X and some bug or connection problem tells h5py that there’s 18 exabytes. h5py then asks the OS to give them those 18 exabytes which the OS politely denies.

See also:

LuckyMD · 2020-08-03T08:54:09Z

This is quite a common error on our internal servers @Hrovatin. I have been getting around it by reading from a different server, and then it just often works. It would be great if you can figure our what the issue might be.

Hrovatin · 2020-08-03T09:40:00Z

Which server do you suggest? - I had tried a couple with no success. I am having a lot of trouble with it - I am getting errors when reading different parts of the file - even when trying to use just h5py.

LuckyMD · 2020-08-03T10:26:44Z

I have been moving between interactive servers not on the queue. icb-lisa, icb-sarah, and icb-mona, and if none of those work, then the older servers hias, sepp.

flying-sheep · 2020-08-03T13:01:44Z

From my time in @theislab I infer this means it’s a network mount problem.

You can probably fix it by putting the file somewhere in the local file system then. Since /home/* is network-mounted, that means /localscratch/ or /tmp/ I assume

Hrovatin · 2020-08-03T13:54:26Z

Thank you very much @flying-sheep - copying to tmp for the time of reading seems to currently work.

flying-sheep · 2020-08-03T18:51:32Z

Great to hear! Usually when there’s weird, site-specific errors, I say I can’t help because I don’t have SSH access and “my crystal ball is currently out of order”.

Seems like my crystal ball worked just fine these days!

LuckyMD · 2020-08-03T20:27:48Z

@flying-sheep just wait until tomorrow... when the next random error occurs ;).

abuchin · 2022-06-03T20:42:26Z

Found the same error in our internal workflows. Saved the data to h5py files, but could not open them anymore for some reason.

Error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/anndata/_io/utils.py in func_wrapper(elem, *args, **kwargs)
    155         try:
--> 156             return func(elem, *args, **kwargs)
    157         except Exception as e:

/opt/conda/lib/python3.7/site-packages/anndata/_io/h5ad.py in read_group(group)
    531     if encoding_type:
--> 532         EncodingVersions[encoding_type].check(
    533             group.name, group.attrs["encoding-version"]

/opt/conda/lib/python3.7/enum.py in __getitem__(cls, name)
    356     def __getitem__(cls, name):
--> 357         return cls._member_map_[name]
    358 

KeyError: 'dict'

During handling of the above exception, another exception occurred:

AnnDataReadError                          Traceback (most recent call last)
<ipython-input-20-38a594ec7d06> in <module>
----> 1 adata_ast=sc.read_h5ad('../../data_processed/Leng_2020/adata_ast.h5ad')

/opt/conda/lib/python3.7/site-packages/anndata/_io/h5ad.py in read_h5ad(filename, backed, as_sparse, as_sparse_fmt, chunk_size)
    424                 d[k] = read_dataframe(f[k])
    425             else:  # Base case
--> 426                 d[k] = read_attribute(f[k])
    427 
    428         d["raw"] = _read_raw(f, as_sparse, rdasp)

/opt/conda/lib/python3.7/functools.py in wrapper(*args, **kw)
    838                             '1 positional argument')
    839 
--> 840         return dispatch(args[0].__class__)(*args, **kw)
    841 
    842     funcname = getattr(func, '__name__', 'singledispatch function')

/opt/conda/lib/python3.7/site-packages/anndata/_io/utils.py in func_wrapper(elem, *args, **kwargs)
    161                 parent = _get_parent(elem)
    162                 raise AnnDataReadError(
--> 163                     f"Above error raised while reading key {elem.name!r} of "
    164                     f"type {type(elem)} from {parent}."
    165                 )

AnnDataReadError: Above error raised while reading key '/layers' of type <class 'h5py._hl.group.Group'> from /.
adata_ast=sc.read_h5ad('../../data_processed/Leng_2020/adata_ast.h5ad')

Versions

Package Version

absl-py 1.1.0
aiohttp 3.8.1
aiosignal 1.2.0
anndata 0.7.5
anndata2ri 1.0.6
annoy 1.17.0
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
asn1crypto 1.4.0
async-timeout 4.0.2
asynctest 0.13.0
attrs 20.3.0
backcall 0.2.0
beautifulsoup4 4.11.1
bleach 5.0.0
boto3 1.17.66
botocore 1.20.66
brotlipy 0.7.0
cached-property 1.5.2
cachetools 5.2.0
certifi 2020.12.5
cffi 1.14.5
chardet 4.0.0
charset-normalizer 2.0.12
chex 0.1.3
click 8.1.3
colormath 3.0.0
commonmark 0.9.1
conda 4.6.14
conda-package-handling 1.7.3
cryptography 3.4.7
cycler 0.10.0
Cython 0.29.30
decorator 5.0.7
defusedxml 0.7.1
dill 0.3.3
dm-tree 0.1.7
docrep 0.3.2
entrypoints 0.4
et-xmlfile 1.1.0
fa2 0.3.5
fastjsonschema 2.15.3
flatbuffers 2.0
flax 0.5.0
frozenlist 1.3.0
fsspec 2022.5.0
future 0.18.2
get-version 2.2
google-auth 2.6.6
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
grpcio 1.46.3
h5py 3.2.1
idna 2.10
imageio 2.19.3
importlib-metadata 4.11.4
importlib-resources 5.7.1
ipykernel 5.5.4
ipython 7.23.1
ipython-genutils 0.2.0
ipywidgets 7.7.0
jax 0.3.13
jaxlib 0.3.10
jedi 0.18.0
Jinja2 3.1.2
jmespath 0.10.0
joblib 1.0.1
jsonschema 4.6.0
jupyter-client 6.1.12
jupyter-core 4.7.1
jupyterlab-pygments 0.2.2
jupyterlab-widgets 1.1.0
kiwisolver 1.3.1
legacy-api-wrap 1.2
leidenalg 0.8.4
llvmlite 0.35.0
loompy 3.0.7
louvain 0.7.0
Markdown 3.3.7
MarkupSafe 2.1.1
matplotlib 3.4.1
matplotlib-inline 0.1.2
mistune 0.8.4
msgpack 1.0.4
multidict 6.0.2
multipledispatch 0.6.0
multiprocess 0.70.11.1
natsort 7.1.1
nbclient 0.6.4
nbconvert 6.5.0
nbformat 5.4.0
nest-asyncio 1.5.5
networkx 2.5
notebook 6.4.11
numba 0.52.0
numexpr 2.7.3
numpy 1.19.5
numpy-groupies 0.9.17
numpyro 0.9.2
oauthlib 3.2.0
openpyxl 3.0.10
opt-einsum 3.3.0
optax 0.1.2
packaging 20.9
pandas 1.2.0
pandocfilters 1.5.0
parso 0.8.2
pathos 0.2.7
patsy 0.5.1
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.1.1
pip 21.1.1
pox 0.2.9
ppft 1.6.6.3
prometheus-client 0.14.1
prompt-toolkit 3.0.18
protobuf 3.19.0
protobuf3-to-dict 0.1.5
ptyprocess 0.7.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycosat 0.6.3
pycparser 2.20
pyDeprecate 0.3.1
Pygments 2.9.0
pyOpenSSL 20.0.1
pyparsing 2.4.7
pyro-api 0.1.2
pyro-ppl 1.8.1
pyrsistent 0.18.1
PySocks 1.7.1
python-dateutil 2.8.1
python-igraph 0.9.1
pytorch-lightning 1.5.10
pytz 2021.1
PyWavelets 1.3.0
PyYAML 6.0
pyzmq 22.0.3
requests 2.25.1
requests-oauthlib 1.3.1
rich 12.4.4
rpy2 3.4.2
rsa 4.8
ruamel-yaml-conda 0.15.80
ruamel.yaml 0.17.21
ruamel.yaml.clib 0.2.6
s3transfer 0.4.2
sagemaker 2.39.0.post0
scanpy 1.6.1
scikit-image 0.19.2
scikit-learn 0.24.2
scikit-misc 0.1.4
scipy 1.6.0
scrublet 0.2.3
scvi-tools 0.16.2
seaborn 0.11.1
Send2Trash 1.8.0
setuptools 59.5.0
setuptools-scm 6.0.1
sinfo 0.3.1
six 1.15.0
smdebug-rulesconfig 1.0.1
soupsieve 2.3.2.post1
spectra 0.0.11
statsmodels 0.12.2
stdlib-list 0.8.0
tables 3.6.1
tensorboard 2.9.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
terminado 0.15.0
texttable 1.6.3
threadpoolctl 2.1.0
tifffile 2021.11.2
tinycss2 1.1.1
toolz 0.11.2
torch 1.11.0
torchmetrics 0.9.0
tornado 6.1
tqdm 4.60.0
traitlets 5.2.2.post1
typing-extensions 4.2.0
tzlocal 2.1
umap-learn 0.4.6
urllib3 1.26.4
wcwidth 0.2.5
webencodings 0.5.1
Werkzeug 2.1.2
wheel 0.36.2
widgetsnbextension 3.6.0
yarl 1.7.2
zipp 3.4.1
Note: you may need to restart the kernel to use updated packages."

Has anyone found any solution to work around this issue?

dsm-72 · 2022-06-30T15:42:52Z

I want to follow up and see if this has a solution

YY-SONG0718 · 2022-07-14T13:45:59Z

Having same issue

ling-lyang · 2022-07-28T10:01:24Z

same

ktpolanski · 2022-07-28T11:27:17Z

I'm pretty sure none of you are having the same issue as the original one reported here. Compare @abuchin 's error message of KeyError: 'dict' to the original poster's error of OSError: Can't read data.

The thing you're seeing is a new one stemming from an update to anndata. You're trying to read in a h5ad file created with a newer version of the package with your older one. I think the cutoff point is 0.8.0 but I could be mistaken.

Upgrade your anndata and you should be ok.

cartal · 2022-08-09T15:51:15Z

The solution from @ktpolanski fixed this for me.

CHAOYiming · 2022-08-27T16:47:20Z

Thanks @ktpolanski. Problem solved.

Benfeitas · 2022-09-01T14:24:49Z

I was facing this issue in 0.7.8. Upgrading to 0.8.0 solved the problem

cc36 · 2022-10-18T09:12:03Z

Same here, thanks @ktpolanski !!!

tanvirraihan142 · 2023-01-11T03:39:56Z

I was facing this issue in 0.7.8. Upgrading to 0.8.0 solved the problem

how did you update? pip says that 0.7.8 is the latest version

cartal · 2023-01-11T08:57:08Z

For me pip install anndata --upgrade did the trick. You could also do pip install anndata==0.8.0 if that's the specific version you want to have.

ktpolanski · 2023-01-11T09:19:50Z

Then it's probably a case of having an old python3. I loaded up an environment where I have python 3.6.9 and the newest version it saw was 0.7.8.

YubinXie · 2023-01-23T06:02:20Z

pip install anndata --upgrade works.
The issue occurs when you saved anndata from a new version and when you try to load with old version.

Sunyiqing2003 · 2023-07-08T17:17:22Z

I am having the same problem,however pip install anndata --upgrade didn't work for me. pip said it is already the latest version: Requirement already satisfied: anndata in d:\python3.10.9\lib\site-packages (0.9.1), then I really don't know what to do. Could you guys help me with that? [crying]

flying-sheep · 2023-07-10T06:17:12Z

@Sunyiqing2003 we certainly don’t want you crying!

people here had problem reading with older anndata versions, but you seem to have the newest one, so it’s not the same issue. could you file a new issue?

Sunyiqing2003 · 2023-07-10T08:02:38Z

@flying-sheep Surely I can file a new issue , Thank you very much !

Sunyiqing2003 · 2023-07-10T08:45:56Z

@Sunyiqing2003 we certainly don’t want you crying!

people here had problem reading with older anndata versions, but you seem to have the newest one, so it’s not the same issue. could you file a new issue?

Thank you for your time and attention , I really appreciate it. I have filed a new issue : https://github.com/scverse/scanpy/issues/2551. I might know the reason why updating anndata didn't work. the main reason for me seems to be big array and memory error.

Hrovatin added the Bug 🐛 label Aug 1, 2020

flying-sheep closed this as completed Aug 1, 2020

flying-sheep removed the Bug 🐛 label Aug 1, 2020

Hrovatin mentioned this issue Aug 1, 2020

OSError: Can't read data - unexpectedly large 'bytes actually read' h5py/h5py#1610

Closed

Hrovatin mentioned this issue Aug 3, 2020

Anndata write error in saving uns scverse/anndata#409

Closed

huidongchen mentioned this issue Nov 10, 2021

PBG training error and unable to perform conda install pinellolab/simba#5

Closed

mumichae mentioned this issue Apr 5, 2022

Generated test data not readable theislab/scib-pipeline#26

Closed

eboileau mentioned this issue May 27, 2022

File upload dieterich-lab/gEAR#3

Closed

3 tasks

brianpenghe mentioned this issue Feb 18, 2023

Above error raised while reading key '/layers' of type <class 'h5py._hl.group.Group'> #2297

Closed

swbioinf mentioned this issue Jan 11, 2024

Interoperability issues with anndata/scanpy-scripts bioconda/bioconda-recipes#45164

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sc.read_h5ad randomly produces AnnDataReadError/OSError #1351

sc.read_h5ad randomly produces AnnDataReadError/OSError #1351

sc.read_h5ad randomly produces AnnDataReadError/OSError #1351

sc.read_h5ad randomly produces AnnDataReadError/OSError #1351

Comments

Versions: