[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gene co-expression networks #72

Open
seth-ament opened this issue Jan 23, 2018 · 18 comments
Open

gene co-expression networks #72

seth-ament opened this issue Jan 23, 2018 · 18 comments

Comments

@seth-ament
Copy link

We are very impressed with the scalability of scanpy. We are interested in performing gene co-expression clustering on large single-cell RNAseq datasets. This typically involves calculating pairwise correlations between genes, then using these correlations as distance metrics for hierarchical and k-means clustering. Does scanpy already support these kinds of analyses?

@jorvis
Copy link
Contributor
jorvis commented Jan 24, 2018

I would be very interested in helping to add any of these if they do not currently exist or aren't already in development.

@falexwolf
Copy link
Member

Dear both, sorry about the late response... I've become the father of twins in the past weeks... Will respond much more quickly again soon.

Yes, we're working on this and will provide one solution within the next days. @tcallies could you push what you wrote?

You can then tell me if this does the job for you.

@tcallies
Copy link
Contributor

Dear both,

correlation matrices are available now. Following our usual split into tools and plotting, you can call

sc.tl.correlation_matrix(adata,name_list, n_genes=20, annotation_key=None, method='pearson')

for correlation matrix calculation.
I have left out a few parameters because I wrote the function actually to conveniently plot results from DE testing, but the basic functionality is the following:

adata is the usual AnnData object you are working with.
name_list is a string containing gene names and should be specified.
n_genes cuts the name_list if the number specified is smaller then the length of the list, so set this high enough if you want to work with large data
annotation_key allows you to specify a string that works as the key in the AnnData object where results are stored. By default, the key is "Correlation_matrix"

The method basically wraps the pd.DataFrame.corr method, which allows you to specify the correlation method ('pearson', 'spearman', 'kendall').

I use it for smaller data so it has not been optimized for performance (yet), but I tested the method for 3k cells and 600 genes and ended up with a runtime of ~8 seconds. I hope that is conveniently fast enough for you (if not let us know).

After calling the tool, you can plot correlation matrices (using a wrapper for seaborn heatmap) by calling

sc.pl.correlation_matrix(adata, annotation_key=None)

This function searches basically only the AnnData annotation (again, if no key specified, "Correlation_matrix" is the default).

Hope this does the job!

@falexwolf
Copy link
Member

Cool, sounds great! Thank you! I will also play around with this. Why don't you add it to the documentation? Maybe here https://github.com/theislab/scanpy/blob/980aa00adca49f6aa994a6f870ad98c3ad9218af/scanpy/api/__init__.py#L60?

@falexwolf
Copy link
Member

Ah! And we should also think about the naming convention here. Maybe gene_gene_correlation? We will have all kinds of correlation matrices floating around scanpy and we should have very specific naming conventions...

@falexwolf
Copy link
Member
falexwolf commented Jan 31, 2018

It will be hard to maintain an overview of what's going on with all the names that were not specific enough and had to be removed but still kept at some place to maintain backward compatibility.

@falexwolf
Copy link
Member
falexwolf commented Jan 31, 2018

@seth-ament @jorvis Having the correlation matrix, you then want to cluster it using hierarchical clustering, right? So, in order to achieve this, shall we add this functionality to clustermap, which currently clusters the expression matrix itself?

@tcallies
Copy link
Contributor
tcallies commented Jan 31, 2018

I will certainly update my new stuff today at least once (probably more often ) and change the name / add the documentation
and then let you know as soon as the name has changed

@jorvis
Copy link
Contributor
jorvis commented Feb 1, 2018

That sounds right, yes. Looking forward to this being available.

@seth-ament
Copy link
Author

Yes, thanks so much. This looks great. Typically, we cut the hierarchical tree to produce gene clusters, summarize these clusters as the mean expression of the genes within the cluster, then pass the mean expression profile to plotting functions like coloring tSNE plots and violin plots.

@jorvis
Copy link
Contributor
jorvis commented Feb 5, 2018

Any updates here? I'd love to add this to an analysis tool UI I'm working on (and presenting at a conference this weekend). Very happy to promote scanpy there.

@wyattmcdonnell
Copy link

Hi all—does anybody have a skeleton snippet they're willing to share here on how to run this in the current version of Scanpy? Thanks!

@falexwolf
Copy link
Member
falexwolf commented Jun 25, 2018

Unfortunately, all of this discussion here was not really further pursued, I have to admit.

In principle, these are very simple things. However, I'm a bit afraid of offering a canonical function as I fear that there are also a lot of bad ways of visualizing gene correlation plots and I don't feel capable of judging this. If no one else wants to make a pull request for that (maybe using what @tcallies already did, but I fear it's not really serving the purpose of the discussion here: here, here) it would be cool if someone sent me an example case, which clearly shows what you want.

Maybe @jorvis, you can send images for the examples you have in mind?

@flying-sheep
Copy link
Member

It’s still not in the docs, and by now also broken… #392

@hyjforesight
Copy link

Hello @tcallies @falexwolf @flying-sheep
Somehow, it looks like sc.tl.correlation_matrix was removed from scanpy?

sc.tl.correlation_matrix(adata_sub2, name_list=['SMARCA4', 'TP53'], n_genes=20, annotation_key=None, method='pearson')
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_3196/1400689712.py in <module>
----> 1 sc.tl.correlation_matrix(adata_sub2, name_list=['SMARCA4', 'TP53'], n_genes=20, annotation_key=None, method='pearson')

AttributeError: module 'scanpy.tools' has no attribute 'correlation_matrix'

@mssher07
Copy link

same error, seconded -- is there an alternative approach built in?

@mys721tx
Copy link
mys721tx commented Aug 5, 2023

Looks like when _top_genes.py is renamed, correlation_matrix is no longer exported.

@mys721tx
Copy link
mys721tx commented Aug 5, 2023

A very dodgy workaround would be

from scanpy.tools import _top_genes
from scanpy.plotting import _anndata

_top_genes.correlation_matrix(adata, names, annotation_key=None, method='pearson')

_anndata.correlation_matrix(adata, groupby='leiden')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants