OSError: Can't read data - unexpectedly large 'bytes actually read' #1610

Hrovatin · 2020-08-01T18:14:35Z

I was trying to load some datasets with scanpy.read_h5ad(file_name), which is based on h5py. The developers of the Scanpy instructed me that this is in fact h5py problem and that they can not help me with it (see scverse/scanpy#1351). When trying to load the h5ad dataset (based on hdf5) I frequently get the below error. When I re-run the code multiple times or at different times it sometimes works, but often I get the error (using the same code and data). This happens when reading different h5ad datasets (e.g. is not specific to one dataset). At all times there seems to be enough free RAM / similar amount of free RAM - one of the h5ad files is 63M and the server has more than 300GB free RAM and 40GB free sweep. I also tried using different servers, but this did not help. This happens both when using jupyter-notebook and python without jupyter-notebook.

Operating System: Fedora 28
Python version: Python 3.8.5
Where Python was acquired: Anaconda
h5py version: 2.10.0
HDF5 version: 1.10.6
The full traceback/stack trace shown (if it appears):

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, *args, **kwargs)
    155         try:
--> 156             return func(elem, *args, **kwargs)
    157         except Exception as e:

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_io/h5ad.py in read_group(group)
    505     if "h5sparse_format" in group.attrs:  # Backwards compat
--> 506         return SparseDataset(group).to_memory()
    507 

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_core/sparse_dataset.py in to_memory(self)
    370         mtx = format_class(self.shape, dtype=self.dtype)
--> 371         mtx.data = self.group["data"][...]
    372         mtx.indices = self.group["indices"][...]

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/h5py/_hl/dataset.py in __getitem__(self, args)
    572         fspace = selection.id
--> 573         self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
    574 

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5d.pyx in h5py.h5d.DatasetID.read()

h5py/_proxy.pyx in h5py._proxy.dset_rw()

h5py/_proxy.pyx in h5py._proxy.H5PY_H5Dread()

OSError: Can't read data (file read failed: time = Sat Aug  1 13:27:54 2020
, filename = '/path.../filtered_gene_bc_matrices.h5ad', file descriptor = 47, errno = 5, error message = 'Input/output error', buf = 0x55ec782e9031, total read size = 7011, bytes this sub-read = 7011, bytes actually read = 18446744073709551615, offset = 0)

During handling of the above exception, another exception occurred:

AnnDataReadError                          Traceback (most recent call last)
<ipython-input-14-faac769583f8> in <module>
     17     #while True:
     18         #try:
---> 19             adatas.append(sc.read_h5ad(file))
     20             file_diffs.append('_'.join([file.split('/')[i] for i in diff_path_idx]))
     21             #break

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_io/h5ad.py in read_h5ad(filename, backed, as_sparse, as_sparse_fmt, chunk_size)
    411                 d[k] = read_dataframe(f[k])
    412             else:  # Base case
--> 413                 d[k] = read_attribute(f[k])
    414 
    415         d["raw"] = _read_raw(f, as_sparse, rdasp)

~/miniconda3/envs/rpy2_3/lib/python3.8/functools.py in wrapper(*args, **kw)
    873                             '1 positional argument')
    874 
--> 875         return dispatch(args[0].__class__)(*args, **kw)
    876 
    877     funcname = getattr(func, '__name__', 'singledispatch function')

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, *args, **kwargs)
    160             else:
    161                 parent = _get_parent(elem)
--> 162                 raise AnnDataReadError(
    163                     f"Above error raised while reading key {elem.name!r} of "
    164                     f"type {type(elem)} from {parent}."

AnnDataReadError: Above error raised while reading key '/X' of type <class 'h5py._hl.group.Group'> from /.

Please note that I deleted most of the path of my file from the stacktrace - it is stored in my workspace on the server.

The text was updated successfully, but these errors were encountered:

takluyver · 2020-08-01T20:20:12Z

There was a similar error reported recently in #1592. 18446744073709551615 is 2 ** 64 - 1, but that doesn't really help me diagnose it.

Are you parallelising anything with multiple processes - or could scanpy be doing some internal parallelism? Opening an HDF5 file before a fork can cause weird problems with reading it. I thought of this because you said it sometimes works and sometimes doesn't, which suggests a race condition.

Other than that, we may just have to pass you on again, to HDF5 itself. You can email help@hdfgroup.org or use the HDF forum for that.

Hrovatin · 2020-08-01T21:17:38Z

Thank you. I do not think that any of it is parallelized. My file is saved locally - on the server where I am trying to read it,

takluyver · 2020-08-02T11:30:23Z

In that case, the only thing I can think to suggest is asking HDF group about it. The error is coming from HDF5, and h5py is just another layer of code wrapping around that.

If it's not something they already recognise, someone will need to try to reduce it to a minimal example which reproduces the problem.

Hrovatin · 2020-08-03T17:24:37Z

It seems that the problem was due to scverse/scanpy#1351 (comment) . This comment seems to resolve my problems for now.

takluyver · 2020-08-04T08:13:44Z

OK, thanks. I guess we should remember to check for network filesystems when these errors come up. Maybe we should have a troubleshooting bit in the docs somewhere.

JSegueni · 2021-08-27T15:04:38Z

Hey everyone,

I'm having the same issue as @Hrovatin. I'm not using scanpy but Hi-C tools to convert a huge matrix from one format to another.
The tools are : hicexplorer (hicConvertFormat command) and after multiple failures, the hicpro2higlass util from Hi-C Pro. Both programs are using the cooler package, based on h5py, to convert the matrix.

Here is the same error message I get for both runs:

`WARNING:py.warnings:/home/me/miniconda3/lib/python3.8/site-packages/cooler/util.py:733: FutureWarning: is_categorical is deprecated and will be removed in a future version. Use is_categorical_dtype instead
is_cat = pd.api.types.is_categorical(bins["chrom"])

INFO:cooler.cli.load:fields: {'bin1_id': 0, 'bin2_id': 1, 'count': 2}
INFO:cooler.cli.load:dtypes: {'bin1_id': <class 'int'>, 'bin2_id': <class 'int'>, 'count': <class 'numpy.int32'>}
INFO:cooler.cli.load:symmetric-upper: True
INFO:cooler.create:Writing chunk 0: /store/me/HiC/Downstream_analysis/cool/_tmp35548/tmpnax3zp6z.multi.cool::0
INFO:cooler.create:Creating cooler at "/store/me/HiC/Downstream_analysis/cool/_tmp35548/tmpnax3zp6z.multi.cool::/0"
Traceback (most recent call last):
File "/home/me/miniconda3/bin/cooler", line 10, in
sys.exit(cli())
File "/home/me/miniconda3/lib/python3.8/site-packages/click/core.py", line 1137, in call
return self.main(*args, **kwargs)
File "/home/me/miniconda3/lib/python3.8/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/home/me/miniconda3/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/me/miniconda3/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/me/miniconda3/lib/python3.8/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/home/me/miniconda3/lib/python3.8/site-packages/cooler/cli/load.py", line 320, in load
create_from_unordered(
File "/home/me/miniconda3/lib/python3.8/site-packages/cooler/create/_create.py", line 724, in create_from_unordered
create(uri, bins, chunk, columns=columns, dtypes=dtypes, mode="a", **kwargs)
File "/home/me/miniconda3/lib/python3.8/site-packages/cooler/create/_create.py", line 616, in create
with h5py.File(file_path, "r+") as f:
File "/home/me/miniconda3/lib/python3.8/site-packages/h5py/_hl/files.py", line 444, in init
fid = make_fid(name, mode, userblock_size,
File "/home/me/miniconda3/lib/python3.8/site-packages/h5py/_hl/files.py", line 201, in make_fid
fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 100, in h5py.h5f.open
OSError: [Errno 5] Unable to open file (file read failed: time = Fri Aug 27 16:41:26 2021
, filename = '/store/me/HiC/Downstream_analysis/cool/_tmp35548/tmpnax3zp6z.multi.cool', file descriptor = 6, errno = 5, error message = 'Input/output error', buf = 0x7ffdfd77ec80, total read size = 8, bytes this sub-read = 8, bytes actually read = 18446744073709551615, offset = 0)
Error: ./hicpro2higlass.sh - line 222 - exit status of last command: 1. Exit
Error: exit status detected. Exit.
Cleaning temporary folders ...`

takluyver · 2021-08-28T10:45:10Z

Is it possible that /store is some sort of network filesystem?

JSegueni · 2021-08-30T07:28:25Z

Hey,
Thanks for your message.
Yeah, it is.
I'm running everything on a HPC cluster.

JSegueni · 2021-09-01T13:20:16Z

Should I create a new issue for this or could it be possible to reopen this one ?

takluyver · 2021-09-05T14:24:54Z

The error is coming from HDF5, so you might want to ask HDF group about it - help@hdfgroup.org, or on the HDF forum. But if you're accessing data on a network filesystem, there's a good chance that's the problem. You might be able to avoid the error by using SWMR mode, but that has its own limitations.

takluyver · 2021-09-05T14:24:55Z

The error is coming from HDF5, so you might want to ask HDF group about it - help@hdfgroup.org, or on the HDF forum. But if you're accessing data on a network filesystem, there's a good chance that's the problem. You might be able to avoid the error by using SWMR mode, but that has its own limitations.

diesseh · 2022-02-12T21:59:58Z

Hello everybody,

I am new in python and I try to use h5py package following the tutorial avalaible here : https://lpdaac.usgs.gov/resources/e-learning/getting-started-gedi-l1b-data-python/

When I want to know the list : list(gediL1B.keys())

Here is the error message that I receive :

TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_13932/2859272625.py in
----> 1 list(gediL1B.keys())

~\anaconda3\envs\geditutorial\lib_collections_abc.py in iter(self)
718
719 def iter(self):
--> 720 yield from self._mapping
721
722 KeysView.register(dict_keys)

~\anaconda3\envs\geditutorial\lib\site-packages\h5py_hl\group.py in iter(self)
431 def iter(self):
432 """ Iterate over member names """
--> 433 for x in self.id.iter():
434 yield self._d(x)
435

h5py\h5g.pyx in h5py.h5g.GroupID.iter()

h5py\h5g.pyx in h5py.h5g.GroupIter.init()

h5py_objects.pyx in h5py._objects.with_phil.wrapper()

h5py\h5g.pyx in h5py.h5g.GroupID.get_num_objs()

TypeError: Not a group (not a group)

Could you help me please?

takluyver added the bug-in-external-lib label Aug 2, 2020

Hrovatin closed this as completed Aug 3, 2020

JSegueni mentioned this issue Sep 13, 2021

Conversion issue to mcool format deeptools/HiCExplorer#753

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OSError: Can't read data - unexpectedly large 'bytes actually read' #1610

OSError: Can't read data - unexpectedly large 'bytes actually read' #1610

OSError: Can't read data - unexpectedly large 'bytes actually read' #1610

OSError: Can't read data - unexpectedly large 'bytes actually read' #1610

Comments