-
Notifications
You must be signed in to change notification settings - Fork 585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export expression matrix from h5ad #262
Comments
If you want to extract it in python, you can load the h5ad file using To extract the matrix into R, you can use the An alternative to the I hope this helps. |
Hi, this is very useful. But I think I formulated my question the wrong way. Can I export the h5ad file to a standard 10X h5 file? |
That, I can't help you with I'm afraid. I'm not as familiar with the h5ad format. |
This is an interesting question... I just googled it and found only
examples in R, but that may be good enough for you:
https://rdrr.io/github/MarioniLab/DropletUtils/man/write10xCounts.html
|
Hi @cartal, it wouldn't be very hard to export to a 10x h5 file, but I'd need to write a custom function for it. Why is it needed? Does 10x offer any downstream analysis that you'd want to use on the data? I thought there are none, hence there is only |
Hi, I'm sorry I forgot about this trend, I just stumble into the same issue. Lets say I have done my analysis in scanpy and everything is good and nice, but now I want to run, say, the cluster 10 from the louvain subset, with Palantir. Palantir can read 10X and 10X_H5 files. Is there a way to plug-and-play this with scanpy? In another case, if I want to extract the subset expression matrix, where rows are genes (with rownames as gene symbols) and columns are cells (with colnames as cells), so I can use this with SCENIC. How could I get this from the I apologise in advance if I'm asking something very basic, but it will be really nice to have some sort of interconectivity between tools, since scanpy is so nice to have as a major analysis suite. |
You can always export as a I can imagine that Palantir would also accept |
Hi, I am trying PAGA. Thanks a lot for the help. |
@falexwolf @cartal @LuckyMD I am also trying to export a gene by cell expression file. I tried using adata.write_csvs(filename, skip_data=False) but that wrote the output to multiple files. Is there a way to generate a single file with genes as rows (with gene names as row IDs) and cells as columns (barcodes as column IDs) ? Thank you in advance for your help. |
No, there is no way to produce a single file with data and metadata. Having genes as rows can simply be achieved by transposing the matrix ( |
In exporting.cellbrowser the cellbrowser export function creates two files,
one with the expression matrix (genes on rows, cells as columns) and one
for the cell metadata. That may be what you're looking for?
… |
@falexwolf thanks for the feedback. As @maximilianh suggested, I was able to export the expression matrix from the cellbrowser export function. Thank you for your help. |
Nice to hear that it worked. As a side product, you can now create a cell
browser html directory from the generated directory.
…On Thu, Feb 7, 2019 at 10:21 AM aditisk ***@***.***> wrote:
@falexwolf <https://github.com/falexwolf> thanks for the feedback. As
@maximilianh <https://github.com/maximilianh> suggested, I was able to
export the expression matrix from the cellbrowser export function. Thank
you for your help.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#262 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAS-TXGCkdaQWO8ks_x7uOm-P2_ISArRks5vLG6RgaJpZM4Wne7Z>
.
|
@maximilianh I was able to use the cell browser export function in the past but this time I am getting an error message: INFO:root:Writing scanpy matrix to adata_cellbrowser_04_01_19_CD8_subclustered/exprMatrix.tsv.gz I am using an h5ad file to import my ann data object. Is that why there is some issue with finding cluster markers ? I am able to plot the clusters in a UMAP plot so I know that the 'louvain' observation exists. Any thoughts on why this is happening ? Thanks. |
Just a thought... have you run |
@LuckyMD I did not run sc.tl.rank_genes_groups() which was the problem. @maximilianh I think it should be optional to include the cluster-specific markers so maybe keeping it as a warning might be the best ? That way the user has control on what they want to include. |
@maximilianh I think those messages are from your code? maybe you should improve the error message to include something like
|
Hi, the expression matrix I exported from adata.write only have the top variable genes. Is there a way to output the raw matrix including all genes? |
The scanpyToCellbrowser function has an option useRaw that will use the
.raw matrix, if present, for the .tsv export.
Otherwise, the raw matrix of all genes is stored as ad.raw.X and the
variable names are in ad.raw.var. You can use scanpyToCellbrowser to write
the matrix and all annotations, or anndataToTsv to write just the matrix.
Or use code from there to write your own.
…On Fri, May 31, 2019 at 5:14 PM Jing He ***@***.***> wrote:
Hi, the expression matrix I exported from adata.write only have the top
variable genes. Is there a way to output the raw matrix including all genes?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#262>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACL4TORHPOQ2GTWTUGTAI3PYE6ENANCNFSM4FU553MQ>
.
|
Thanks for the suggestion. Actually, I am using cellxgene which takes the
h5ad file as an input. when using anndata.write() function, it only output
the anndata.X as the expression matrix. And also there is no option of
useRaw here.
Also, I tried to re-assign anndata.X = anndata.raw.X, but it returns an
error saying its wrong shape.
Do you have any suggestions?
Thanks a lot!
On Mon, Jun 3, 2019 at 6:03 AM Maximilian Haeussler <
notifications@github.com> wrote:
… The scanpyToCellbrowser function has an option useRaw that will use the
.raw matrix, if present, for the .tsv export.
Otherwise, the raw matrix of all genes is stored as ad.raw.X and the
variable names are in ad.raw.var. You can use scanpyToCellbrowser to write
the matrix and all annotations, or anndataToTsv to write just the matrix.
Or use code from there to write your own.
On Fri, May 31, 2019 at 5:14 PM Jing He ***@***.***> wrote:
> Hi, the expression matrix I exported from adata.write only have the top
> variable genes. Is there a way to output the raw matrix including all
genes?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <
#262
>,
> or mute the thread
> <
https://github.com/notifications/unsubscribe-auth/AACL4TORHPOQ2GTWTUGTAI3PYE6ENANCNFSM4FU553MQ
>
> .
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#262>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAUAIIM2RYJQTSDTKQ4XZLTPYTT6BANCNFSM4FU553MQ>
.
--
Cheers!
Jing
|
Hi @hejing3283, The wrong shape is probably because you have subsetted You can easily proceed by just making a new anndata object from
|
Hi,
Thanks so much for the explanations! Doing it now and it works.
Best,
Jing
… On Jun 5, 2019, at 10:39, MalteDLuecken ***@***.***> wrote:
Hi @hejing3283,
The wrong shape is probably because you have subsetted adata.X to highly variable genes, or did some additional filtering after storing data in adata.raw. For a while now scanpy avoids filtering highly variable genes, but instead annotates them in adata.var['highly_variable'] which is then used in sc.pp.pca(). I would suggest you use subset=False next time you use sc.pp.highly_variable() to avoid different dimensions in adata.X and adata.raw.X.
You can easily proceed by just making a new anndata object from adata.raw.X, adata.raw.var and adata.raw.obs and storing this to be loaded into cellxgene. Just do the following:
adata_raw.write(my_file)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Here's my solution to a similar inter-operability hiccup. This produces files similar to 10X v2 triplet format, plus an extra cell metadata file. pd.DataFrame(ad.var.index).to_csv(os.path.join(destination, "genes.tsv" ), sep = "\t", index_col = False)
pd.DataFrame(ad.obs.index).to_csv(os.path.join(destination, "barcodes.tsv"), sep = "\t", index_col = False)
ad.obs.to_csv(os.path.join(destination, "metadata.tsv"), sep = "\t", index_col = True)
scipy.io.mmwrite(os.path.join(destination, "matrix.mtx"), ad.X) |
You can convert h5ad format to Seurat object using sceasy. |
To write a genes (rows) vs. cells (columns) matrix, i tried this adata.T.to_df().to_csv('matrix.csv') |
Use this code: |
This code doesn't actually work - rows and columns are switched in the matrix, and it produces an error when you try to read in the output using either |
I was having the same issue as well. I ended up doing what was suggested above:
|
We have this method in scanpy-scripts https://github.com/ebi-gene-expression-group/scanpy-scripts/blob/6297be21119d6964e074fa0b40a3b6fcaec53bbc/scanpy_scripts/cmd_utils.py#L137 - you could as well use it just from there with one of the containers https://quay.io/repository/biocontainers/scanpy-scripts?tab=tags&tag=latest I think it can be used through the filtering CLI call, given numbers that won't filter anything out. |
Hi,
I would like to extract the expression matrix (genes and counts) from a h5ad file.
How can I do this? I have searched the documentation but I couldn't find anything about this (maybe I missed it).
The text was updated successfully, but these errors were encountered: