TypeError: ufunc 'log1p' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind'' #435

ajitjohnson · 2019-01-18T18:56:13Z

Hi there,

I seem to have trouble analyzing a dataset.

https://www.dropbox.com/s/og5lw42chh2qujm/Trial_data1.csv?dl=0

If I run sc.pp.log1p (adata), I get the following error.

TypeError: ufunc 'log1p' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''

I could normalize it on my own and say if I do a PCA analysis and try to plot the results, I get the following error.

sc.pl.pca(adata, color = 'DAPI')
TypeError: object of type 'numpy.int64' has no len()

If I plot it without the color, it does work though.

Below is how I go from CSV to AnnData

Read File

x = pd.read_csv('Trial_data1.csv', delimiter=',', index_col=0)

Convert to AnnData

file_url = 'https://raw.githubusercontent.com/ajitjohnson/Jupyter-Notebooks/master/py_scripts/mi_pp_anndata.py'
exec(open(wget.download(file_url)).read())
adata = mi_pp_anndata (x)

Any help is appreciated. Thank you.

falexwolf · 2019-01-21T09:21:33Z

This is really strange, have you had problems with other datasets? Is your dataset corrupted in some way?

ajitjohnson · 2019-01-22T16:18:35Z

Hi Alex,
I managed to get the log working by using your function to convert to AnnData rather than mine. (adata = sc.AnnData(x))

However, coloring the plots still does not work. I get the following error.
TypeError: object of type 'numpy.int64' has no len()

You can reproduce the error by the following

Load Data

x = pd.read_csv('Trial_data.csv', delimiter=',', index_col=0)

Drop DAPI

x = x.drop(list(x.filter(regex='DAPI.', axis=1)), axis=1)

Convert to AnnData

adata = sc.AnnData(x)

Filter cells

sc.pp.filter_cells(adata, min_genes=1)
sc.pp.filter_genes(adata, min_cells=1)
adata.obs['n_counts'] = adata.X.sum(axis=1)

Normalize data

sc.pp.log1p(adata)

PCA

sc.tl.pca(adata, svd_solver='arpack')
sc.pl.pca(adata)
sc.pl.pca(adata, color='CD3D')

I also tried it on a different dataset.

falexwolf · 2019-01-23T01:36:35Z

Could you post the full error traceback so that I see where the error is raised?

ajitjohnson · 2019-01-23T20:29:39Z

Hi Alex,

Below is the error I get. Thank you for looking at this.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-4fad8adf5d00> in <module>
----> 1 sc.pl.pca(adata, color='CD3D')

~\AppData\Local\Continuum\anaconda3\lib\site-packages\scanpy\plotting\tools\scatterplots.py in pca(adata, **kwargs)
    148     If `show==False` a `matplotlib.Axis` or a list of it.
    149     """
--> 150     return plot_scatter(adata, basis='pca', **kwargs)
    151 
    152 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\scanpy\plotting\tools\scatterplots.py in plot_scatter(adata, color, use_raw, sort_order, edges, edges_width, edges_color, arrows, arrows_kwds, basis, groups, components, projection, color_map, palette, size, frameon, legend_fontsize, legend_fontweight, legend_loc, ncols, hspace, wspace, title, show, save, ax, return_fig, **kwargs)
    275         color_vector, categorical = _get_color_values(adata, value_to_plot,
    276                                                       groups=groups, palette=palette,
--> 277                                                       use_raw=use_raw)
    278 
    279         # check if higher value points should be plot on top

~\AppData\Local\Continuum\anaconda3\lib\site-packages\scanpy\plotting\tools\scatterplots.py in _get_color_values(adata, value_to_plot, groups, palette, use_raw)
    658     # check if value to plot is in var
    659     elif use_raw is False and value_to_plot in adata.var_names:
--> 660         color_vector = adata[:, value_to_plot].X
    661 
    662     elif use_raw is True and value_to_plot in adata.raw.var_names:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\anndata\base.py in __getitem__(self, index)
   1307     def __getitem__(self, index):
   1308         """Returns a sliced view of the object."""
-> 1309         return self._getitem_view(index)
   1310 
   1311     def _getitem_view(self, index):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\anndata\base.py in _getitem_view(self, index)
   1311     def _getitem_view(self, index):
   1312         oidx, vidx = self._normalize_indices(index)
-> 1313         return AnnData(self, oidx=oidx, vidx=vidx, asview=True)
   1314 
   1315     def _remove_unused_categories(self, df_full, df_sub, uns):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\anndata\base.py in __init__(self, X, obs, var, uns, obsm, varm, layers, raw, dtype, shape, filename, filemode, asview, oidx, vidx)
    662             if not isinstance(X, AnnData):
    663                 raise ValueError('`X` has to be an AnnData object.')
--> 664             self._init_as_view(X, oidx, vidx)
    665         else:
    666             self._init_as_actual(

~\AppData\Local\Continuum\anaconda3\lib\site-packages\anndata\base.py in _init_as_view(self, adata_ref, oidx, vidx)
    723             self._X = None
    724         else:
--> 725             self._init_X_as_view()
    726 
    727         self._layers = AnnDataLayers(self, adata_ref=adata_ref, oidx=oidx, vidx=vidx)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\anndata\base.py in _init_X_as_view(self)
    750             shape = (
    751                 get_n_items_idx(self._oidx, self._adata_ref.n_obs),
--> 752                 get_n_items_idx(self._vidx, self._adata_ref.n_vars)
    753             )
    754             if np.isscalar(X):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\anndata\utils.py in get_n_items_idx(idx, l)
    148         return 1
    149     else:
--> 150         return len(idx)

TypeError: object of type 'numpy.int64' has no len()

falexwolf · 2019-01-26T21:54:53Z

Sorry for all the trouble. I just wanted to download from your dropbox link but the file wasn't there anymore...

falexwolf · 2019-01-26T22:32:08Z

Are you running on Windows? Then the solution could be this fix in anndata: scverse/anndata#102

ajitjohnson · 2019-01-26T22:35:25Z

Ah sorry, I happened to have deleted it today. Here is it.

https://www.dropbox.com/s/26a5rhrjj99czeq/Trial_data.csv?dl=0

Meanwhile, I will also take a look at the solution you had sent. I usually run it on a jupyter notebook and I work between windows and mac.

LuckyMD · 2019-03-22T17:28:43Z

So I just reproduced this error for sc.pp.log1p() using my own data after using the sc.pp.downsample_counts() function. It might have to do with that?

i noticed that sc.pp.downsample_counts() returns np.int64 rather than np.float64 I reckon that's what the log transformation is complaining about.

If I add the line:

adata.X = adata.X.astype(np.float64)

after the downsampling call, it works again. Maybe add that to sc.pp.log1p()? Or change sc.pp.downsample_counts() to return np.float64?

ajitjohnson · 2019-03-22T17:34:58Z

It was related to adata conversion that @falexwolf alluded to and specifically affects windows machines (because of changes in numpy). I got the latest version of AnnData and it works now.

LuckyMD · 2019-03-22T18:03:21Z

Exactly the same error message pops up when inputting np.int64 data into sc.pp.log1p(). This is with the latest scanpy, and using data that has otherwise worked well when not using sc.pp.downsample_counts(). I thus wouldn't consider this resolved, although I can open another issue as well.

ajitjohnson · 2019-03-22T20:54:40Z

Aha okay. My problem was resolved when I updated the AnnData package for converting pandas dataframe into AnnData object using

'''adata = sc.AnnData(x)'''

falexwolf · 2019-03-22T21:03:55Z

We just merged an update on the downsample_counts function by @ivirshup; evidently, the data type shouldn't be changed by downsampling, should it?

LuckyMD · 2019-03-22T22:26:23Z

I can confirm that it is currently

ivirshup · 2019-03-23T05:53:49Z

No matter what it returns, it definitely shouldn't make stuff fail. I think that downsample_counts was returning integers before the most recent PR as well.

iirc, I made downsample_counts use integers because a) numba was failing inference unless I was explicit about integers and b) downsampling counts only makes sense for integer valued numbers. At the time I couldn't see a reason to convert the output to a different type.

I figure that log1p should be able to take an integer valued expression matrix. However, I tried to implement that and ended up adding a lot of flow control to an already flow control heavy function, which got ugly:

🍝

def log1p(data, copy=False, chunked=False, chunk_size=None):
    """Logarithmize the data matrix.

    Computes `X = log(X + 1)`, where `log` denotes the natural logarithm.

    Parameters
    ----------
    data : :class:`~anndata.AnnData`, `np.ndarray`, `sp.sparse`
        The (annotated) data matrix of shape `n_obs` × `n_vars`. Rows correspond
        to cells and columns to genes.
    copy : `bool`, optional (default: `False`)
        If an :class:`~anndata.AnnData` is passed, determines whether a copy
        is returned.

    Returns
    -------
    Returns or updates `data`, depending on `copy`.
    """
    if copy:
        if not isinstance(data, AnnData):
            data = data.astype(np.floating)
        data = data.copy()
    elif not isinstance(data, AnnData) and np.issubdtype(data.dtype, np.integer):
        raise TypeError("Cannot perform inplace log1p on integer array")

    def _log1p(X):
        if issparse(X):
            np.log1p(X.data, out=X.data)
        else:
            np.log1p(X, out=X)

        return X

    if isinstance(data, AnnData):
        if not np.issubdtype(data.X.dtype, np.floating):
            data.X = data.X.astype(np.floating, copy=False)
        if chunked:
            for chunk, start, end in data.chunked_X(chunk_size):
                 data.X[start:end] = _log1p(chunk)
        else:
            _log1p(data.X)
    else:
        _log1p(data)

    return data if copy else None

I'll give that another shot, and open a PR. On the return type of downsample_counts, I've noticed many functions in scanpy return float32 matrices regardless of what was given to them. Is this a design that's meant to be propagated? Even if not, what should the return type of downsample_counts be? At the time I figured it didn't matter, since anything downstream should be able to deal with it.

LuckyMD · 2019-03-24T18:04:36Z

Originally everything was np.float32 in scanpy, but as of a recent anndata commit (somewhere between 0.6.11 and 0.6.12), that was changed and now the input data precision is left up to the user.

Which functions hard code np.float32?

falexwolf · 2019-03-24T20:58:21Z

Nothing should be hardcoded np.float32, but it might be that some functions still do that from an early time, where, for instance, scikit-learn's PCA was silently transforming to float64 (and Scanpy silently transformed back etc.).

Nothing should change the dtype that the user wants, except, for instance, when we logarithmize an integer matrix etc. Here, there should be a default dtype='float32' parameter.

[PS: In algorithms that inherently are unstable and would profit more from higher precision, one could think about increasing precision.]

ivirshup · 2019-03-25T06:24:44Z

Should downsample_counts also get a dtype argument? Internally I've used np.integer, which I think just uses the system word size.

davidhbrann · 2019-10-06T18:51:37Z

Hi,

I've been getting the same error when trying to use sc.pp.normalize_total after sc.pp.downsample_counts. Normalize total returns a CSR sparse matrix of type <class 'numpy.int64'>, which then causes sc.pp.normalize_total to error. Not sure where the correct dtype should take place.

pbmc = sc.datasets.pbmc68k_reduced()
pbmc.X = pbmc.raw.X
sc.pp.downsample_counts(pbmc, counts_per_cell=500)
sc.pp.normalize_total(pbmc, target_sum=1e4)

Here's the traceback:

Normalizing counts per cell.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-136-3305b6c650f4> in <module>
      2 pbmc.X = pbmc.raw.X
      3 sc.pp.downsample_counts(pbmc, counts_per_cell=500)
----> 4 sc.pp.normalize_total(pbmc, target_sum=1e4)

~/anaconda2/envs/scanpy/lib/python3.6/site-packages/scanpy/preprocessing/_normalization.py in normalize_total(adata, target_sum, exclude_highly_expressed, max_fraction, key_added, layers, layer_norm, inplace)
    166             adata.obs[key_added] = counts_per_cell
    167         if hasattr(adata.X, '__itruediv__'):
--> 168             _normalize_data(adata.X, counts_per_cell, target_sum)
    169         else:
    170             adata.X = _normalize_data(adata.X, counts_per_cell, target_sum, copy=True)

~/anaconda2/envs/scanpy/lib/python3.6/site-packages/scanpy/preprocessing/_normalization.py in _normalize_data(X, counts, after, copy)
     14     after = np.median(counts[counts>0]) if after is None else after
     15     counts += (counts == 0)
---> 16     counts /= after
     17     if issparse(X):
     18         sparsefuncs.inplace_row_scale(X, 1/counts)

TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''

>>> pbmc.X
<700x765 sparse matrix of type '<class 'numpy.int64'>'

ivirshup · 2019-10-08T02:21:10Z

@dhb2128, as a workaround, this should work:

pbmc = sc.datasets.pbmc68k_reduced()
pbmc.X = pbmc.raw.X
sc.pp.downsample_counts(pbmc, counts_per_cell=500)
pbmc.X = pbmc.X.astype(float)
sc.pp.normalize_total(pbmc, target_sum=1e4)

I've just opened a PR to fix normalize_total not working with integer input values.

WeilerP · 2020-08-26T08:06:47Z

Hi there,
stumbled on this by chance when debugging a similar problem - though I'd share my gained insight:

As @LuckyMD already pointed out here, the root of the problem is feeding np.int64 into sc.preprocessing.log1p. More specifically, the problem occurs in log1p_array here. When specifying out in np.log1p, the input types need to be castable. However, np.log1p returns np.floatX (type code double precision 'd') which cannot be cast to np.int64 (type code long integer 'l'). The error is reproducible with this small snippet of code:

import numpy

a = np.zeros(shape=(1, 1), dtype='int64')
np.log1p(x=a, out=a)

The error can be prevented like this:

import numpy

a = np.zeros(shape=(1, 1), dtype='int64')
a = np.log1p(x=a)

In the case of scanpy, this would mean replacing this line of code by X = np.log1p(X). The drawback being that the operation is no longer inplace.

ivirshup · 2020-08-31T06:37:03Z

@WeilerP I'm pretty sure that should work right now. I actually think issue this has been solved, but just wasn't closed

Here's an example for this specific case:

import scanpy as sc, numpy as np
from scipy import sparse

adata = sc.AnnData(
    np.abs(sparse.random(100, 100, density=0.1, dtype=int, format="csr")),
    dtype=int,
)
display(adata.X)
# <100x100 sparse matrix of type '<class 'numpy.int64'>'
# 	with 1000 stored elements in Compressed Sparse Row format>
sc.pp.log1p(adata)
display(adata.X)
# <100x100 sparse matrix of type '<class 'numpy.float64'>'
# 	with 1000 stored elements in Compressed Sparse Row format>

Basically, we just try to make inplace refer to the anndata object, and be truly inplace on the array when possible.

WeilerP · 2020-08-31T08:32:29Z

@ivirshup, yes, your example works. However, I would not consider the issue as resolved as it still exists IMO.
Your example only works as you are working with a sparse matrix. If X is a np.ndarray, the method still fails:

>>> adata = sc.AnnData(
    np.ceil(np.abs(np.random.randn(10, 10))).astype('int64'),
    dtype=int,
)
>>> sc.pp.log1p(adata)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/lib/python3.7/functools.py", line 840, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/opt/anaconda3/lib/python3.7/site-packages/scanpy/preprocessing/_simple.py", line 350, in log1p_anndata
    X = log1p(X, copy=False, base=base)
  File "/opt/anaconda3/lib/python3.7/functools.py", line 840, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/opt/anaconda3/lib/python3.7/site-packages/scanpy/preprocessing/_simple.py", line 318, in log1p_array
    np.log1p(X, out=X)
TypeError: ufunc 'log1p' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''

ivirshup · 2020-08-31T10:32:10Z

Ah, I'd missed that. Should be fixed with #1400.

brianpenghe · 2024-02-29T15:20:40Z

pbmc.X = pbmc.X.astype(float)

Thanks for this work-around. This solved my problem!

ajitjohnson closed this as completed Mar 22, 2019

LuckyMD reopened this Mar 22, 2019

ivirshup mentioned this issue Mar 23, 2019

[WIP] Make log1p work with integers/ output of downsample_counts #551

Merged

ivirshup mentioned this issue Oct 8, 2019

dtype fixes for downsample and normalization #865

Merged

ivirshup closed this as completed Aug 31, 2020

ivirshup reopened this Aug 31, 2020

ivirshup mentioned this issue Aug 31, 2020

Bugfix log1p for non-float dense arrays #1400

Merged

ivirshup closed this as completed in #1400 Sep 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: ufunc 'log1p' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind'' #435

TypeError: ufunc 'log1p' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind'' #435

TypeError: ufunc 'log1p' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind'' #435

TypeError: ufunc 'log1p' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind'' #435

Comments

Read File

Convert to AnnData

Load Data

Drop DAPI

Convert to AnnData

Filter cells

Normalize data

PCA