[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: Can't read data - unexpectedly large 'bytes actually read' #1610

Closed
Hrovatin opened this issue Aug 1, 2020 · 12 comments
Closed

OSError: Can't read data - unexpectedly large 'bytes actually read' #1610

Hrovatin opened this issue Aug 1, 2020 · 12 comments

Comments

@Hrovatin
Copy link
Hrovatin commented Aug 1, 2020

I was trying to load some datasets with scanpy.read_h5ad(file_name), which is based on h5py. The developers of the Scanpy instructed me that this is in fact h5py problem and that they can not help me with it (see scverse/scanpy#1351). When trying to load the h5ad dataset (based on hdf5) I frequently get the below error. When I re-run the code multiple times or at different times it sometimes works, but often I get the error (using the same code and data). This happens when reading different h5ad datasets (e.g. is not specific to one dataset). At all times there seems to be enough free RAM / similar amount of free RAM - one of the h5ad files is 63M and the server has more than 300GB free RAM and 40GB free sweep. I also tried using different servers, but this did not help. This happens both when using jupyter-notebook and python without jupyter-notebook.

  • Operating System: Fedora 28
  • Python version: Python 3.8.5
  • Where Python was acquired: Anaconda
  • h5py version: 2.10.0
  • HDF5 version: 1.10.6
  • The full traceback/stack trace shown (if it appears):
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, *args, **kwargs)
    155         try:
--> 156             return func(elem, *args, **kwargs)
    157         except Exception as e:

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_io/h5ad.py in read_group(group)
    505     if "h5sparse_format" in group.attrs:  # Backwards compat
--> 506         return SparseDataset(group).to_memory()
    507 

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_core/sparse_dataset.py in to_memory(self)
    370         mtx = format_class(self.shape, dtype=self.dtype)
--> 371         mtx.data = self.group["data"][...]
    372         mtx.indices = self.group["indices"][...]

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/h5py/_hl/dataset.py in __getitem__(self, args)
    572         fspace = selection.id
--> 573         self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
    574 

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5d.pyx in h5py.h5d.DatasetID.read()

h5py/_proxy.pyx in h5py._proxy.dset_rw()

h5py/_proxy.pyx in h5py._proxy.H5PY_H5Dread()

OSError: Can't read data (file read failed: time = Sat Aug  1 13:27:54 2020
, filename = '/path.../filtered_gene_bc_matrices.h5ad', file descriptor = 47, errno = 5, error message = 'Input/output error', buf = 0x55ec782e9031, total read size = 7011, bytes this sub-read = 7011, bytes actually read = 18446744073709551615, offset = 0)

During handling of the above exception, another exception occurred:

AnnDataReadError                          Traceback (most recent call last)
<ipython-input-14-faac769583f8> in <module>
     17     #while True:
     18         #try:
---> 19             adatas.append(sc.read_h5ad(file))
     20             file_diffs.append('_'.join([file.split('/')[i] for i in diff_path_idx]))
     21             #break

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_io/h5ad.py in read_h5ad(filename, backed, as_sparse, as_sparse_fmt, chunk_size)
    411                 d[k] = read_dataframe(f[k])
    412             else:  # Base case
--> 413                 d[k] = read_attribute(f[k])
    414 
    415         d["raw"] = _read_raw(f, as_sparse, rdasp)

~/miniconda3/envs/rpy2_3/lib/python3.8/functools.py in wrapper(*args, **kw)
    873                             '1 positional argument')
    874 
--> 875         return dispatch(args[0].__class__)(*args, **kw)
    876 
    877     funcname = getattr(func, '__name__', 'singledispatch function')

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, *args, **kwargs)
    160             else:
    161                 parent = _get_parent(elem)
--> 162                 raise AnnDataReadError(
    163                     f"Above error raised while reading key {elem.name!r} of "
    164                     f"type {type(elem)} from {parent}."

AnnDataReadError: Above error raised while reading key '/X' of type <class 'h5py._hl.group.Group'> from /.

Please note that I deleted most of the path of my file from the stacktrace - it is stored in my workspace on the server.

@takluyver
Copy link
Member

There was a similar error reported recently in #1592. 18446744073709551615 is 2 ** 64 - 1, but that doesn't really help me diagnose it.

Are you parallelising anything with multiple processes - or could scanpy be doing some internal parallelism? Opening an HDF5 file before a fork can cause weird problems with reading it. I thought of this because you said it sometimes works and sometimes doesn't, which suggests a race condition.

Other than that, we may just have to pass you on again, to HDF5 itself. You can email help@hdfgroup.org or use the HDF forum for that.

@Hrovatin
Copy link
Author
Hrovatin commented Aug 1, 2020

Thank you. I do not think that any of it is parallelized. My file is saved locally - on the server where I am trying to read it,

@takluyver
Copy link
Member

In that case, the only thing I can think to suggest is asking HDF group about it. The error is coming from HDF5, and h5py is just another layer of code wrapping around that.

If it's not something they already recognise, someone will need to try to reduce it to a minimal example which reproduces the problem.

@Hrovatin
Copy link
Author
Hrovatin commented Aug 3, 2020

It seems that the problem was due to scverse/scanpy#1351 (comment) . This comment seems to resolve my problems for now.

@Hrovatin Hrovatin closed this as completed Aug 3, 2020
@takluyver
Copy link
Member

OK, thanks. I guess we should remember to check for network filesystems when these errors come up. Maybe we should have a troubleshooting bit in the docs somewhere.

@JSegueni
Copy link

Hey everyone,

I'm having the same issue as @Hrovatin. I'm not using scanpy but Hi-C tools to convert a huge matrix from one format to another.
The tools are : hicexplorer (hicConvertFormat command) and after multiple failures, the hicpro2higlass util from Hi-C Pro. Both programs are using the cooler package, based on h5py, to convert the matrix.

Here is the same error message I get for both runs:

`WARNING:py.warnings:/home/me/miniconda3/lib/python3.8/site-packages/cooler/util.py:733: FutureWarning: is_categorical is deprecated and will be removed in a future version. Use is_categorical_dtype instead
is_cat = pd.api.types.is_categorical(bins["chrom"])

INFO:cooler.cli.load:fields: {'bin1_id': 0, 'bin2_id': 1, 'count': 2}
INFO:cooler.cli.load:dtypes: {'bin1_id': <class 'int'>, 'bin2_id': <class 'int'>, 'count': <class 'numpy.int32'>}
INFO:cooler.cli.load:symmetric-upper: True
INFO:cooler.create:Writing chunk 0: /store/me/HiC/Downstream_analysis/cool/_tmp35548/tmpnax3zp6z.multi.cool::0
INFO:cooler.create:Creating cooler at "/store/me/HiC/Downstream_analysis/cool/_tmp35548/tmpnax3zp6z.multi.cool::/0"
Traceback (most recent call last):
File "/home/me/miniconda3/bin/cooler", line 10, in
sys.exit(cli())
File "/home/me/miniconda3/lib/python3.8/site-packages/click/core.py", line 1137, in call
return self.main(*args, **kwargs)
File "/home/me/miniconda3/lib/python3.8/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/home/me/miniconda3/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/me/miniconda3/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/me/miniconda3/lib/python3.8/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/home/me/miniconda3/lib/python3.8/site-packages/cooler/cli/load.py", line 320, in load
create_from_unordered(
File "/home/me/miniconda3/lib/python3.8/site-packages/cooler/create/_create.py", line 724, in create_from_unordered
create(uri, bins, chunk, columns=columns, dtypes=dtypes, mode="a", **kwargs)
File "/home/me/miniconda3/lib/python3.8/site-packages/cooler/create/_create.py", line 616, in create
with h5py.File(file_path, "r+") as f:
File "/home/me/miniconda3/lib/python3.8/site-packages/h5py/_hl/files.py", line 444, in init
fid = make_fid(name, mode, userblock_size,
File "/home/me/miniconda3/lib/python3.8/site-packages/h5py/_hl/files.py", line 201, in make_fid
fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 100, in h5py.h5f.open
OSError: [Errno 5] Unable to open file (file read failed: time = Fri Aug 27 16:41:26 2021
, filename = '/store/me/HiC/Downstream_analysis/cool/_tmp35548/tmpnax3zp6z.multi.cool', file descriptor = 6, errno = 5, error message = 'Input/output error', buf = 0x7ffdfd77ec80, total read size = 8, bytes this sub-read = 8, bytes actually read = 18446744073709551615, offset = 0)
Error: ./hicpro2higlass.sh - line 222 - exit status of last command: 1. Exit
Error: exit status detected. Exit.
Cleaning temporary folders ...`

@takluyver
Copy link
Member

Is it possible that /store is some sort of network filesystem?

@JSegueni
Copy link

Hey,
Thanks for your message.
Yeah, it is.
I'm running everything on a HPC cluster.

@JSegueni
Copy link
JSegueni commented Sep 1, 2021

Should I create a new issue for this or could it be possible to reopen this one ?

@takluyver
Copy link
Member

The error is coming from HDF5, so you might want to ask HDF group about it - help@hdfgroup.org, or on the HDF forum. But if you're accessing data on a network filesystem, there's a good chance that's the problem. You might be able to avoid the error by using SWMR mode, but that has its own limitations.

1 similar comment
@takluyver
Copy link
Member

The error is coming from HDF5, so you might want to ask HDF group about it - help@hdfgroup.org, or on the HDF forum. But if you're accessing data on a network filesystem, there's a good chance that's the problem. You might be able to avoid the error by using SWMR mode, but that has its own limitations.

@diesseh
Copy link
diesseh commented Feb 12, 2022

Hello everybody,

I am new in python and I try to use h5py package following the tutorial avalaible here : https://lpdaac.usgs.gov/resources/e-learning/getting-started-gedi-l1b-data-python/

When I want to know the list : list(gediL1B.keys())

Here is the error message that I receive :


TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_13932/2859272625.py in
----> 1 list(gediL1B.keys())

~\anaconda3\envs\geditutorial\lib_collections_abc.py in iter(self)
718
719 def iter(self):
--> 720 yield from self._mapping
721
722 KeysView.register(dict_keys)

~\anaconda3\envs\geditutorial\lib\site-packages\h5py_hl\group.py in iter(self)
431 def iter(self):
432 """ Iterate over member names """
--> 433 for x in self.id.iter():
434 yield self._d(x)
435

h5py\h5g.pyx in h5py.h5g.GroupID.iter()

h5py\h5g.pyx in h5py.h5g.GroupID.iter()

h5py\h5g.pyx in h5py.h5g.GroupIter.init()

h5py_objects.pyx in h5py._objects.with_phil.wrapper()

h5py_objects.pyx in h5py._objects.with_phil.wrapper()

h5py\h5g.pyx in h5py.h5g.GroupID.get_num_objs()

TypeError: Not a group (not a group)

Could you help me please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants