[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doublet filtering function #173

Closed
yueqiw opened this issue Jun 13, 2018 · 18 comments · Fixed by #2703
Closed

Doublet filtering function #173

yueqiw opened this issue Jun 13, 2018 · 18 comments · Fixed by #2703

Comments

@yueqiw
Copy link
yueqiw commented Jun 13, 2018

Hi,

I tried the DoubletDetection Python library on my data and got some interesting result. As it can be run directly on a numpy array of count matrix (adata.X), I thought it would be an interesting feature for scanpy.

clf = doubletdetection.BoostClassifier() 
doublets = clf.fit(adata.X).predict()
adata.obs['doublet'] = pd.Categorical(doublets.astype(bool))
@falexwolf
Copy link
Member

Sorry for the late response, I was on holidays. I'm happy to merge a pull request for this, if the package appears solid. Would you want to add a file scanpy/preprocessing/doubletdetection? We should probably just ask @JonathanShor whether he's interested in an interface for easily accessing his package. If he is, he should also make the pull request, I'd say. 😄

@JonathanShor
Copy link

Thanks to @yueqiw for the confidence. :)

@falexwolf We have no issue with our package being included here, but we wouldn't be able to create a custom API for your package right now, if that's what you were suggesting?

Given my quick overview of your package, two things you should note:

  1. Our method expects the input count matrix to be from a single run. Performance takes a non-trivial hit on aggregate datasets.
  2. Currently, our runtime will not satisfy those impressive metrics cited in your 1.0 announcement. This may possibly change in the future, as we haven't focused on hard optimization yet.

@falexwolf
Copy link
Member

Hi @JonathanShor,

you don't need to create a custom API. One point of Scanpy is to provide convenient access via anndata to many single-cell packages around. The only thing needed for that is to provide a very simple interface like this or this or several of the other tools... Simply click on the GitHub links in the Scanpy docs...

If your package works reliably, both the restrictions you mention should in principle not prevent adding your package. Of course, in the future, we want all elements of Scanpy to scale to millions of cells, not just the core tools. But for a lot of people, it's right now helpful to have a large number of tools available also for relatively small datasets.

The only problem is to avoid cluttering the Scanpy API with virtually any tool there is. Tools in the API should have passed a certain quality check.

Doublet detection is a difficult problem. Already last autumn, we played around with @swolock 's tool but didn't end up using it - it was good, but in our situation, it didn't seem to apply (are you eventually going to distribute a package for it @swolock ?). I myself quickly wrote a tool, too, but it didn't work well. Just yesterday, this appeared. Then there is also this on "empty cell detection". There are more tools out there, I think...

What I mean is: computationally detecting doublets is still something where the field has not agreed on a consensus. Just like batch correction. Therefore, I would not add a tool tl.doublet_detection or tl.detect_doublets to the API at this stage.

There are two options. Either we create a .beta module of the API for tools that don't even have a preprint and add your tool and similar cases in the future there. We could make a separate page for that entitled Cutting Edge Beta Tools which advertises these tools for people to try out and play around with it. When you have a solid preprint and/or publication or if you think that your tool should go in the main API anyways 😄, we should add your package as tl.detect_doublets_ONEWORDDESCRIBGINGYOURALGORITHM...

@flying-sheep @gokceneraslan @fidelram @dawe anyone opinions on such cases?

@swolock
Copy link
Contributor
swolock commented Jun 25, 2018

Hi @falexwolf, yes I will be making my method available. A rough version is already on github, and I also played around with adding it to my scanpy fork (though not the right way -- I added it to tl rather than pp). I'll hopefully clean it up and release something more official when I have the chance.

@falexwolf
Copy link
Member

Good to hear! Looking forward to learning more about it.
PS: Having a doublet detection tool in tl would be fine, I'd say... pp and tl are just meant to give a rough orientation for users... in some cases, it's not completely clear what preprocessing and what downstream analysis is...

@ivirshup
Copy link
Member
ivirshup commented Apr 9, 2019

It looks like this may have stalled a bit. Is anyone currently working on making some form of doublet detection available from scanpy?

@swolock
Copy link
Contributor
swolock commented Apr 10, 2019

Hi @ivirshup, I've been meaning to get back to this. I've just started on an AnnData-compatible version of Scrublet which should be easy to hook up to Scanpy. Will keep you posted.

@cartal
Copy link
cartal commented May 14, 2019

Hi @ivirshup, I've been meaning to get back to this. I've just started on an AnnData-compatible version of Scrublet which should be easy to hook up to Scanpy. Will keep you posted.

Any updates on this?

@falexwolf
Copy link
Member

I guess, we should ask @swolock. 🙂

@SamueleSoraggi
Copy link
SamueleSoraggi commented May 14, 2019 via email

@swolock
Copy link
Contributor
swolock commented May 16, 2019

@cartal @SamueleSoraggi
For some reason I decided to integrate Scrublet using Scanpy's functions where possible, rather than making a simple wrapper. The core functionality is up and running in this fork, and now I just need to add documentation, make some of the code more Scanpythonic(?), and add an example.

@SamueleSoraggi
Copy link
SamueleSoraggi commented May 16, 2019 via email

@fidelram
Copy link
Collaborator
fidelram commented Jul 5, 2019

@swolock why don't you submit a PR? I just tested your code and seems to work.

@pinin4fjords
Copy link
Contributor

How is this work going? We'd love to integrate Scrublet into our workflows, which are currently quite Scanpy-centric.

@fidelram
Copy link
Collaborator

@pinin4fjords Given that there is no response from @swolock I tried the following, which works well:

import Scrublet as scr
scrub = scr.Scrublet(adata.raw.X)
adata.obs['doublet_scores'], adata.obs['predicted_doublets'] = scrub.scrub_doublets()
scrub.plot_histogram()
sc.pl.umap(adata, color='doublet_scores')

@pinin4fjords
Copy link
Contributor

Thanks @fidelram, that will run the whole Scrublet workflow so will certainly do the trick. But I'd prefer a more Scanpy-integrated approach, which I think I can see how to do from @swolock's fork.

@Tolupeter
Copy link

I have problem installing and importing scrublet on windows please can you help me
Here is my code !pip install scrublet
PackagesNotFoundError: The following packages are not available from current channels:

  • annoy

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org/

and use the search bar at the top of the page.

This was referenced Oct 26, 2023
@flying-sheep
Copy link
Member
flying-sheep commented Oct 26, 2023

I started work to move scrublet into scanpy (since the last commit was 3 years ago and it’s safe to assume it’s not maintained anymore)

#2703

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.