Doublet filtering function #173

yueqiw · 2018-06-13T23:25:18Z

Hi,

I tried the DoubletDetection Python library on my data and got some interesting result. As it can be run directly on a numpy array of count matrix (adata.X), I thought it would be an interesting feature for scanpy.

clf = doubletdetection.BoostClassifier() 
doublets = clf.fit(adata.X).predict()
adata.obs['doublet'] = pd.Categorical(doublets.astype(bool))

The text was updated successfully, but these errors were encountered:

falexwolf · 2018-06-20T09:05:54Z

Sorry for the late response, I was on holidays. I'm happy to merge a pull request for this, if the package appears solid. Would you want to add a file scanpy/preprocessing/doubletdetection? We should probably just ask @JonathanShor whether he's interested in an interface for easily accessing his package. If he is, he should also make the pull request, I'd say. 😄

JonathanShor · 2018-06-20T18:51:43Z

Thanks to @yueqiw for the confidence. :)

@falexwolf We have no issue with our package being included here, but we wouldn't be able to create a custom API for your package right now, if that's what you were suggesting?

Given my quick overview of your package, two things you should note:

Our method expects the input count matrix to be from a single run. Performance takes a non-trivial hit on aggregate datasets.
Currently, our runtime will not satisfy those impressive metrics cited in your 1.0 announcement. This may possibly change in the future, as we haven't focused on hard optimization yet.

falexwolf · 2018-06-22T08:37:45Z

Hi @JonathanShor,

you don't need to create a custom API. One point of Scanpy is to provide convenient access via anndata to many single-cell packages around. The only thing needed for that is to provide a very simple interface like this or this or several of the other tools... Simply click on the GitHub links in the Scanpy docs...

If your package works reliably, both the restrictions you mention should in principle not prevent adding your package. Of course, in the future, we want all elements of Scanpy to scale to millions of cells, not just the core tools. But for a lot of people, it's right now helpful to have a large number of tools available also for relatively small datasets.

The only problem is to avoid cluttering the Scanpy API with virtually any tool there is. Tools in the API should have passed a certain quality check.

Doublet detection is a difficult problem. Already last autumn, we played around with @swolock 's tool but didn't end up using it - it was good, but in our situation, it didn't seem to apply (are you eventually going to distribute a package for it @swolock ?). I myself quickly wrote a tool, too, but it didn't work well. Just yesterday, this appeared. Then there is also this on "empty cell detection". There are more tools out there, I think...

What I mean is: computationally detecting doublets is still something where the field has not agreed on a consensus. Just like batch correction. Therefore, I would not add a tool tl.doublet_detection or tl.detect_doublets to the API at this stage.

There are two options. Either we create a .beta module of the API for tools that don't even have a preprint and add your tool and similar cases in the future there. We could make a separate page for that entitled Cutting Edge Beta Tools which advertises these tools for people to try out and play around with it. When you have a solid preprint and/or publication or if you think that your tool should go in the main API anyways 😄, we should add your package as tl.detect_doublets_ONEWORDDESCRIBGINGYOURALGORITHM...

@flying-sheep @gokceneraslan @fidelram @dawe anyone opinions on such cases?

swolock · 2018-06-25T20:48:25Z

Hi @falexwolf, yes I will be making my method available. A rough version is already on github, and I also played around with adding it to my scanpy fork (though not the right way -- I added it to tl rather than pp). I'll hopefully clean it up and release something more official when I have the chance.

falexwolf · 2018-06-26T11:45:03Z

Good to hear! Looking forward to learning more about it.
PS: Having a doublet detection tool in tl would be fine, I'd say... pp and tl are just meant to give a rough orientation for users... in some cases, it's not completely clear what preprocessing and what downstream analysis is...

ivirshup · 2019-04-09T00:35:49Z

It looks like this may have stalled a bit. Is anyone currently working on making some form of doublet detection available from scanpy?

swolock · 2019-04-10T13:25:50Z

Hi @ivirshup, I've been meaning to get back to this. I've just started on an AnnData-compatible version of Scrublet which should be easy to hook up to Scanpy. Will keep you posted.

cartal · 2019-05-14T16:32:33Z

Hi @ivirshup, I've been meaning to get back to this. I've just started on an AnnData-compatible version of Scrublet which should be easy to hook up to Scanpy. Will keep you posted.

Any updates on this?

falexwolf · 2019-05-14T18:18:51Z

I guess, we should ask @swolock. 🙂

SamueleSoraggi · 2019-05-14T20:14:28Z

I can apply pretty easily scrublet from the original python package, so I guess a wrapper should be something fast to implement :) I was thinking about doing it last week, but I am no expert in this kind of stuff :/ Den tir. 14. maj 2019 kl. 20.18 skrev Alex Wolf <notifications@github.com>:

…

I guess, we should ask @swolock <https://github.com/swolock>. 🙂 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#173>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACC66UN3NQVYPI4KPDUAOK3PVL7A3ANCNFSM4FE4LIFQ> .

swolock · 2019-05-16T03:15:49Z

@cartal @SamueleSoraggi
For some reason I decided to integrate Scrublet using Scanpy's functions where possible, rather than making a simple wrapper. The core functionality is up and running in this fork, and now I just need to add documentation, make some of the code more Scanpythonic(?), and add an example.

SamueleSoraggi · 2019-05-16T06:33:56Z

Your way sounds sure better, many things into the scrublet algorithm are in redundancy with components of scanpy. It will sure look great :) Just one thing: in the scrublet paper they suggest always to just run the simulation of doublets and look at the expected vs estimated fraction of doublets before removing doublets. If those two values do not match, they say one should rerun scrublet and tune the expected fraction. Does your script only run simulation of doublets and output the doublets score, or does it also remove doublets at once? If you do the latter, then one is not able to simulate doublets more than once to adjust the expected doublet fraction. Cheers. Den tor. 16. maj 2019 kl. 05.15 skrev Sam Wolock <notifications@github.com>:

…

@cartal <https://github.com/cartal> @SamueleSoraggi <https://github.com/SamueleSoraggi> For some reason I decided to integrate Scrublet using Scanpy's functions where possible, rather than making a simple wrapper. The core functionality is up and running in this fork <https://github.com/swolock/scanpy>, and now I just need to add documentation, make some of the code more Scanpythonic(?), and add an example. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#173>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACC66UI4FF4LES7GRVKHZZDPVTGWTANCNFSM4FE4LIFQ> .

fidelram · 2019-07-05T11:03:18Z

@swolock why don't you submit a PR? I just tested your code and seems to work.

pinin4fjords · 2019-10-22T15:12:27Z

How is this work going? We'd love to integrate Scrublet into our workflows, which are currently quite Scanpy-centric.

fidelram · 2019-10-23T06:47:15Z

@pinin4fjords Given that there is no response from @swolock I tried the following, which works well:

import Scrublet as scr
scrub = scr.Scrublet(adata.raw.X)
adata.obs['doublet_scores'], adata.obs['predicted_doublets'] = scrub.scrub_doublets()
scrub.plot_histogram()
sc.pl.umap(adata, color='doublet_scores')

pinin4fjords · 2019-10-23T08:11:42Z

Thanks @fidelram, that will run the whole Scrublet workflow so will certainly do the trick. But I'd prefer a more Scanpy-integrated approach, which I think I can see how to do from @swolock's fork.

Tolupeter · 2023-10-10T18:03:19Z

I have problem installing and importing scrublet on windows please can you help me
Here is my code !pip install scrublet
PackagesNotFoundError: The following packages are not available from current channels:

annoy

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org/

and use the search bar at the top of the page.

flying-sheep · 2023-10-26T11:59:16Z

I started work to move scrublet into scanpy (since the last commit was 3 years ago and it’s safe to assume it’s not maintained anymore)

#2703

flying-sheep added the Enhancement ✨ label Oct 23, 2019

gokceneraslan added the Help Wanted label Mar 16, 2020

pinin4fjords mentioned this issue Oct 30, 2020

Add Scrublet as an external tool #1476

Closed

This was referenced Oct 26, 2023

Move scrublet out of external #2703

Merged

Remove sc.external #2717

Open

flying-sheep closed this as completed in #2703 Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doublet filtering function #173

Doublet filtering function #173

Doublet filtering function #173

Doublet filtering function #173

Comments