gene co-expression networks #72

seth-ament · 2018-01-23T19:13:17Z

We are very impressed with the scalability of scanpy. We are interested in performing gene co-expression clustering on large single-cell RNAseq datasets. This typically involves calculating pairwise correlations between genes, then using these correlations as distance metrics for hierarchical and k-means clustering. Does scanpy already support these kinds of analyses?

jorvis · 2018-01-24T05:16:07Z

I would be very interested in helping to add any of these if they do not currently exist or aren't already in development.

falexwolf · 2018-01-30T14:53:13Z

Dear both, sorry about the late response... I've become the father of twins in the past weeks... Will respond much more quickly again soon.

Yes, we're working on this and will provide one solution within the next days. @tcallies could you push what you wrote?

You can then tell me if this does the job for you.

tcallies · 2018-01-31T10:34:50Z

Dear both,

correlation matrices are available now. Following our usual split into tools and plotting, you can call

sc.tl.correlation_matrix(adata,name_list, n_genes=20, annotation_key=None, method='pearson')

for correlation matrix calculation.
I have left out a few parameters because I wrote the function actually to conveniently plot results from DE testing, but the basic functionality is the following:

adata is the usual AnnData object you are working with.
name_list is a string containing gene names and should be specified.
n_genes cuts the name_list if the number specified is smaller then the length of the list, so set this high enough if you want to work with large data
annotation_key allows you to specify a string that works as the key in the AnnData object where results are stored. By default, the key is "Correlation_matrix"

The method basically wraps the pd.DataFrame.corr method, which allows you to specify the correlation method ('pearson', 'spearman', 'kendall').

I use it for smaller data so it has not been optimized for performance (yet), but I tested the method for 3k cells and 600 genes and ended up with a runtime of ~8 seconds. I hope that is conveniently fast enough for you (if not let us know).

After calling the tool, you can plot correlation matrices (using a wrapper for seaborn heatmap) by calling

sc.pl.correlation_matrix(adata, annotation_key=None)

This function searches basically only the AnnData annotation (again, if no key specified, "Correlation_matrix" is the default).

Hope this does the job!

falexwolf · 2018-01-31T10:42:43Z

Cool, sounds great! Thank you! I will also play around with this. Why don't you add it to the documentation? Maybe here https://github.com/theislab/scanpy/blob/980aa00adca49f6aa994a6f870ad98c3ad9218af/scanpy/api/__init__.py#L60?

falexwolf · 2018-01-31T10:44:59Z

Ah! And we should also think about the naming convention here. Maybe gene_gene_correlation? We will have all kinds of correlation matrices floating around scanpy and we should have very specific naming conventions...

falexwolf · 2018-01-31T10:46:40Z

It will be hard to maintain an overview of what's going on with all the names that were not specific enough and had to be removed but still kept at some place to maintain backward compatibility.

falexwolf · 2018-01-31T10:48:45Z

@seth-ament @jorvis Having the correlation matrix, you then want to cluster it using hierarchical clustering, right? So, in order to achieve this, shall we add this functionality to clustermap, which currently clusters the expression matrix itself?

tcallies · 2018-01-31T11:07:16Z

I will certainly update my new stuff today at least once (probably more often ) and change the name / add the documentation
and then let you know as soon as the name has changed

jorvis · 2018-02-01T19:25:20Z

That sounds right, yes. Looking forward to this being available.

seth-ament · 2018-02-02T09:47:16Z

Yes, thanks so much. This looks great. Typically, we cut the hierarchical tree to produce gene clusters, summarize these clusters as the mean expression of the genes within the cluster, then pass the mean expression profile to plotting functions like coloring tSNE plots and violin plots.

jorvis · 2018-02-05T20:27:24Z

Any updates here? I'd love to add this to an analysis tool UI I'm working on (and presenting at a conference this weekend). Very happy to promote scanpy there.

wyattmcdonnell · 2018-06-22T21:40:35Z

Hi all—does anybody have a skeleton snippet they're willing to share here on how to run this in the current version of Scanpy? Thanks!

falexwolf · 2018-06-25T09:54:18Z

Unfortunately, all of this discussion here was not really further pursued, I have to admit.

In principle, these are very simple things. However, I'm a bit afraid of offering a canonical function as I fear that there are also a lot of bad ways of visualizing gene correlation plots and I don't feel capable of judging this. If no one else wants to make a pull request for that (maybe using what @tcallies already did, but I fear it's not really serving the purpose of the discussion here: here, here) it would be cool if someone sent me an example case, which clearly shows what you want.

Maybe @jorvis, you can send images for the examples you have in mind?

flying-sheep · 2018-12-07T16:11:45Z

It’s still not in the docs, and by now also broken… #392

hyjforesight · 2022-11-21T17:13:37Z

Hello @tcallies @falexwolf @flying-sheep
Somehow, it looks like sc.tl.correlation_matrix was removed from scanpy?

sc.tl.correlation_matrix(adata_sub2, name_list=['SMARCA4', 'TP53'], n_genes=20, annotation_key=None, method='pearson')
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_3196/1400689712.py in <module>
----> 1 sc.tl.correlation_matrix(adata_sub2, name_list=['SMARCA4', 'TP53'], n_genes=20, annotation_key=None, method='pearson')

AttributeError: module 'scanpy.tools' has no attribute 'correlation_matrix'

mssher07 · 2022-11-21T17:45:28Z

same error, seconded -- is there an alternative approach built in?

mys721tx · 2023-08-05T06:51:33Z

Looks like when _top_genes.py is renamed, correlation_matrix is no longer exported.

mys721tx · 2023-08-05T06:59:15Z

A very dodgy workaround would be

from scanpy.tools import _top_genes
from scanpy.plotting import _anndata

_top_genes.correlation_matrix(adata, names, annotation_key=None, method='pearson')

_anndata.correlation_matrix(adata, groupby='leiden')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gene co-expression networks #72

gene co-expression networks #72

gene co-expression networks #72

gene co-expression networks #72

Comments