-
Notifications
You must be signed in to change notification settings - Fork 585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError: ufunc 'log1p' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind'' #435
Comments
This is really strange, have you had problems with other datasets? Is your dataset corrupted in some way? |
Hi Alex, However, coloring the plots still does not work. I get the following error. You can reproduce the error by the following Load Datax = pd.read_csv('Trial_data.csv', delimiter=',', index_col=0) Drop DAPIx = x.drop(list(x.filter(regex='DAPI.', axis=1)), axis=1) Convert to AnnDataadata = sc.AnnData(x) Filter cellssc.pp.filter_cells(adata, min_genes=1) Normalize datasc.pp.log1p(adata) PCAsc.tl.pca(adata, svd_solver='arpack') I also tried it on a different dataset. |
Could you post the full error traceback so that I see where the error is raised? |
Hi Alex, Below is the error I get. Thank you for looking at this. ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-12-4fad8adf5d00> in <module>
----> 1 sc.pl.pca(adata, color='CD3D')
~\AppData\Local\Continuum\anaconda3\lib\site-packages\scanpy\plotting\tools\scatterplots.py in pca(adata, **kwargs)
148 If `show==False` a `matplotlib.Axis` or a list of it.
149 """
--> 150 return plot_scatter(adata, basis='pca', **kwargs)
151
152
~\AppData\Local\Continuum\anaconda3\lib\site-packages\scanpy\plotting\tools\scatterplots.py in plot_scatter(adata, color, use_raw, sort_order, edges, edges_width, edges_color, arrows, arrows_kwds, basis, groups, components, projection, color_map, palette, size, frameon, legend_fontsize, legend_fontweight, legend_loc, ncols, hspace, wspace, title, show, save, ax, return_fig, **kwargs)
275 color_vector, categorical = _get_color_values(adata, value_to_plot,
276 groups=groups, palette=palette,
--> 277 use_raw=use_raw)
278
279 # check if higher value points should be plot on top
~\AppData\Local\Continuum\anaconda3\lib\site-packages\scanpy\plotting\tools\scatterplots.py in _get_color_values(adata, value_to_plot, groups, palette, use_raw)
658 # check if value to plot is in var
659 elif use_raw is False and value_to_plot in adata.var_names:
--> 660 color_vector = adata[:, value_to_plot].X
661
662 elif use_raw is True and value_to_plot in adata.raw.var_names:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\anndata\base.py in __getitem__(self, index)
1307 def __getitem__(self, index):
1308 """Returns a sliced view of the object."""
-> 1309 return self._getitem_view(index)
1310
1311 def _getitem_view(self, index):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\anndata\base.py in _getitem_view(self, index)
1311 def _getitem_view(self, index):
1312 oidx, vidx = self._normalize_indices(index)
-> 1313 return AnnData(self, oidx=oidx, vidx=vidx, asview=True)
1314
1315 def _remove_unused_categories(self, df_full, df_sub, uns):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\anndata\base.py in __init__(self, X, obs, var, uns, obsm, varm, layers, raw, dtype, shape, filename, filemode, asview, oidx, vidx)
662 if not isinstance(X, AnnData):
663 raise ValueError('`X` has to be an AnnData object.')
--> 664 self._init_as_view(X, oidx, vidx)
665 else:
666 self._init_as_actual(
~\AppData\Local\Continuum\anaconda3\lib\site-packages\anndata\base.py in _init_as_view(self, adata_ref, oidx, vidx)
723 self._X = None
724 else:
--> 725 self._init_X_as_view()
726
727 self._layers = AnnDataLayers(self, adata_ref=adata_ref, oidx=oidx, vidx=vidx)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\anndata\base.py in _init_X_as_view(self)
750 shape = (
751 get_n_items_idx(self._oidx, self._adata_ref.n_obs),
--> 752 get_n_items_idx(self._vidx, self._adata_ref.n_vars)
753 )
754 if np.isscalar(X):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\anndata\utils.py in get_n_items_idx(idx, l)
148 return 1
149 else:
--> 150 return len(idx)
TypeError: object of type 'numpy.int64' has no len() |
Sorry for all the trouble. I just wanted to download from your dropbox link but the file wasn't there anymore... |
Are you running on Windows? Then the solution could be this fix in anndata: scverse/anndata#102 |
Ah sorry, I happened to have deleted it today. Here is it. https://www.dropbox.com/s/26a5rhrjj99czeq/Trial_data.csv?dl=0 Meanwhile, I will also take a look at the solution you had sent. I usually run it on a jupyter notebook and I work between windows and mac. |
So I just reproduced this error for i noticed that If I add the line:
after the downsampling call, it works again. Maybe add that to |
It was related to adata conversion that @falexwolf alluded to and specifically affects windows machines (because of changes in numpy). I got the latest version of AnnData and it works now. |
Exactly the same error message pops up when inputting |
Aha okay. My problem was resolved when I updated the AnnData package for converting pandas dataframe into AnnData object using '''adata = sc.AnnData(x)''' |
We just merged an update on the |
No matter what it returns, it definitely shouldn't make stuff fail. I think that iirc, I made I figure that 🍝def log1p(data, copy=False, chunked=False, chunk_size=None):
"""Logarithmize the data matrix.
Computes `X = log(X + 1)`, where `log` denotes the natural logarithm.
Parameters
----------
data : :class:`~anndata.AnnData`, `np.ndarray`, `sp.sparse`
The (annotated) data matrix of shape `n_obs` × `n_vars`. Rows correspond
to cells and columns to genes.
copy : `bool`, optional (default: `False`)
If an :class:`~anndata.AnnData` is passed, determines whether a copy
is returned.
Returns
-------
Returns or updates `data`, depending on `copy`.
"""
if copy:
if not isinstance(data, AnnData):
data = data.astype(np.floating)
data = data.copy()
elif not isinstance(data, AnnData) and np.issubdtype(data.dtype, np.integer):
raise TypeError("Cannot perform inplace log1p on integer array")
def _log1p(X):
if issparse(X):
np.log1p(X.data, out=X.data)
else:
np.log1p(X, out=X)
return X
if isinstance(data, AnnData):
if not np.issubdtype(data.X.dtype, np.floating):
data.X = data.X.astype(np.floating, copy=False)
if chunked:
for chunk, start, end in data.chunked_X(chunk_size):
data.X[start:end] = _log1p(chunk)
else:
_log1p(data.X)
else:
_log1p(data)
return data if copy else None I'll give that another shot, and open a PR. On the return type of |
Originally everything was Which functions hard code |
Nothing should be hardcoded Nothing should change the dtype that the user wants, except, for instance, when we logarithmize an integer matrix etc. Here, there should be a default [PS: In algorithms that inherently are unstable and would profit more from higher precision, one could think about increasing precision.] |
Should |
Hi, I've been getting the same error when trying to use pbmc = sc.datasets.pbmc68k_reduced()
pbmc.X = pbmc.raw.X
sc.pp.downsample_counts(pbmc, counts_per_cell=500)
sc.pp.normalize_total(pbmc, target_sum=1e4) Here's the traceback: Normalizing counts per cell.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-136-3305b6c650f4> in <module>
2 pbmc.X = pbmc.raw.X
3 sc.pp.downsample_counts(pbmc, counts_per_cell=500)
----> 4 sc.pp.normalize_total(pbmc, target_sum=1e4)
~/anaconda2/envs/scanpy/lib/python3.6/site-packages/scanpy/preprocessing/_normalization.py in normalize_total(adata, target_sum, exclude_highly_expressed, max_fraction, key_added, layers, layer_norm, inplace)
166 adata.obs[key_added] = counts_per_cell
167 if hasattr(adata.X, '__itruediv__'):
--> 168 _normalize_data(adata.X, counts_per_cell, target_sum)
169 else:
170 adata.X = _normalize_data(adata.X, counts_per_cell, target_sum, copy=True)
~/anaconda2/envs/scanpy/lib/python3.6/site-packages/scanpy/preprocessing/_normalization.py in _normalize_data(X, counts, after, copy)
14 after = np.median(counts[counts>0]) if after is None else after
15 counts += (counts == 0)
---> 16 counts /= after
17 if issparse(X):
18 sparsefuncs.inplace_row_scale(X, 1/counts)
TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
|
@dhb2128, as a workaround, this should work: pbmc = sc.datasets.pbmc68k_reduced()
pbmc.X = pbmc.raw.X
sc.pp.downsample_counts(pbmc, counts_per_cell=500)
pbmc.X = pbmc.X.astype(float)
sc.pp.normalize_total(pbmc, target_sum=1e4) I've just opened a PR to fix |
Hi there, As @LuckyMD already pointed out here, the root of the problem is feeding import numpy
a = np.zeros(shape=(1, 1), dtype='int64')
np.log1p(x=a, out=a) The error can be prevented like this: import numpy
a = np.zeros(shape=(1, 1), dtype='int64')
a = np.log1p(x=a) In the case of |
@WeilerP I'm pretty sure that should work right now. I actually think issue this has been solved, but just wasn't closed Here's an example for this specific case: import scanpy as sc, numpy as np
from scipy import sparse
adata = sc.AnnData(
np.abs(sparse.random(100, 100, density=0.1, dtype=int, format="csr")),
dtype=int,
)
display(adata.X)
# <100x100 sparse matrix of type '<class 'numpy.int64'>'
# with 1000 stored elements in Compressed Sparse Row format>
sc.pp.log1p(adata)
display(adata.X)
# <100x100 sparse matrix of type '<class 'numpy.float64'>'
# with 1000 stored elements in Compressed Sparse Row format> Basically, we just try to make |
@ivirshup, yes, your example works. However, I would not consider the issue as resolved as it still exists IMO. >>> adata = sc.AnnData(
np.ceil(np.abs(np.random.randn(10, 10))).astype('int64'),
dtype=int,
)
>>> sc.pp.log1p(adata)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/anaconda3/lib/python3.7/functools.py", line 840, in wrapper
return dispatch(args[0].__class__)(*args, **kw)
File "/opt/anaconda3/lib/python3.7/site-packages/scanpy/preprocessing/_simple.py", line 350, in log1p_anndata
X = log1p(X, copy=False, base=base)
File "/opt/anaconda3/lib/python3.7/functools.py", line 840, in wrapper
return dispatch(args[0].__class__)(*args, **kw)
File "/opt/anaconda3/lib/python3.7/site-packages/scanpy/preprocessing/_simple.py", line 318, in log1p_array
np.log1p(X, out=X)
TypeError: ufunc 'log1p' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind'' |
Ah, I'd missed that. Should be fixed with #1400. |
Thanks for this work-around. This solved my problem! |
Hi there,
I seem to have trouble analyzing a dataset.
https://www.dropbox.com/s/og5lw42chh2qujm/Trial_data1.csv?dl=0
If I run sc.pp.log1p (adata), I get the following error.
TypeError: ufunc 'log1p' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
I could normalize it on my own and say if I do a PCA analysis and try to plot the results, I get the following error.
sc.pl.pca(adata, color = 'DAPI')
TypeError: object of type 'numpy.int64' has no len()
If I plot it without the color, it does work though.
Below is how I go from CSV to AnnData
Read File
x = pd.read_csv('Trial_data1.csv', delimiter=',', index_col=0)
Convert to AnnData
file_url = 'https://raw.githubusercontent.com/ajitjohnson/Jupyter-Notebooks/master/py_scripts/mi_pp_anndata.py'
exec(open(wget.download(file_url)).read())
adata = mi_pp_anndata (x)
Any help is appreciated. Thank you.
The text was updated successfully, but these errors were encountered: