[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full length cDNA distribution as a guide to abundances? #56

Open
vals opened this issue May 9, 2016 · 1 comment
Open

Full length cDNA distribution as a guide to abundances? #56

vals opened this issue May 9, 2016 · 1 comment

Comments

@vals
Copy link
Contributor
vals commented May 9, 2016

A lot of the times when we are assessing our samples before we move on to fragmenting cDNA in to fragments, we look at the distribution of full length cDNA using a Bioanalyzer.

See for example panel a of this figure

With the reference transcriptome, we know the distribution of transcripts with given lengths.

We can view the reference transcript length distribution as unweighted distribution of lengths, and the electropherogram as the distribution when weighing transcript lengths by their relative abundances.

Thus it seems the distribution of full length cDNA could be informative when inferring the TPMs (relative abundances) in a sample.

Do you think it could be possible to integrate with the quantification model?

@rob-p
Copy link
Collaborator
rob-p commented May 10, 2016

Hi Valentine,

This is very interesting. I think the answer to your question is "yes", but I'm not quite sure how one would use this information yet. Ideally, the inferred NPM (nucleotides per million) distribution should match the electropherogram (perhaps minimizing some metric like KL-divergence or JS-divergence). The challenge is that this is still rather coarse-grained information, in that there are likely many different solutions that would match this distribution well. That being said, it certainly seems like one could use this distribution to inform oneself when divergent solutions are being inferred. At the very least, one could imagine placing a prior (at the start of inference) on transcripts according to the mass of their corresponding length bin in the electropherogram — this might initialize the inference in a manner more likely to concord with the observed distribution. There may, of course, be other, better ways to make use of this information as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants