-
Notifications
You must be signed in to change notification settings - Fork 581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adata.raw gets modified upon log normalization of adata #3073
Comments
@AlejandraRodelaRo I am not so familiar with |
Hi all, @flying-sheep @falexwolf Wanted to echo Alejandro and highlight this is a critical bug, since nearly every function carries a use_raw flag, and the assumption that .raw contains counts is used explicitly or implicitly in numerous scanpy functions. We just realized that a massive dataset we've been processing for ~6 weeks also has no reads in the .raw despite saving it prior to log1p/normalize functions. I am not sure it's helpful, but we see this bug in version scanpy 1.9.8, but in an old dataset/environment with scanpy 1.6.0, .raw correctly preserved counts. |
My opinion would be that you need to write If we don't change it, we could maybe warn if we're mutating Overall, I would recommend that you use |
That makes python-sense. This is absolutely a change in convention though, see:
In addition, both sc.pl.umap and sc.pl.paga_path() come to mind as functions that default to using the .raw layer
I think that's a good idea. In general, it would be very helpful to preserve in the anndata structure some record of the major transformations to .X (or any layer)
This seems like good practice and the workaround we'll apply for now. I do wonder if some change was made after this conversation which you were a part of. Thank you by the way, this package is an amazing tool. |
It’s needed when you modify the The above slices it twice, and only then copies it, because slicing isn’t a modification. So what’s happening is: adata_orig = AnnData(...)
adata_sliced_view = adata_orig[..., :]
assert adata_sliced_view.is_view
adata_sliced_copy = adata_sliced_view[..., :].copy()
assert not adata_sliced_copy.is_view
do_modify(adata_sliced_copy) The slicing could also have been done in one operation adata = adata_orig[(adata.obs["n_genes_by_count"] < 2500) & (adata.obs["pct_counts_mt"] < 5), :].copy() |
Please make sure these conditions are met
What happened?
I am working with a set of 2 10x scRNA samples. I read them, concatenated them and then I did basic filtering. I then used "adata.raw = adata" to freeze the counts on adata.raw before proceding. Then I ran:
To my surprise, when I check the adata.raw I see that the values have been also lognormized (and not only adata).
Is that how it is supposed to be? Is there any way to avoid this behavior ? I know I can store the raw counts in layers, I just want to understand how it works.
To check the data I used :
print(adata.raw.X[1:10,1:10])
Minimal code sample
Error output
No response
Versions
The text was updated successfully, but these errors were encountered: