[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should assemblies be removed if core gene alignment shows redundancies? #580

Closed
chahatupreti opened this issue Sep 21, 2022 · 1 comment
Closed

Comments

@chahatupreti
Copy link

I have a set of 100 bacterial genomes. I annotated them with Prokka and performed pangenome analysis with Roary. Roary outputs the core gene alignment file which I then used to generate a phylogenetic tree using RaxML. While running RaxML, the console output said -

IMPORTANT WARNING - Found 13 sequences that are exactly identical to other sequences in the alignment. Normally they should be excluded from the analysis.

My question is, should I remove these 13 sequences from my subsequent analyses based on this information? I first thought that this it would be obvious to remove these redundant/clonal sequences so that they don't mess up the statistics for gene enrichment etc. But a counterargument is that these 13 sequences are being called as exactly identical to other sequences in my database based on the core gene alignment. What about any differences these 13 assemblies may have (from the sequences these are supposedly identical to) in the non-core genome?

In other words, what if these sequences are actually completely unique but their uniqueness lies in terms of those genes that are not core genes, but those that are present in a subset of the assemblies?

Any insights would be great appreciated. I apologize for this not being a question directly about running Roary but about analysis based on its output.

@andrewjpage
Copy link
Member

It depends, if you are analysing an outbreak and expect to have identical samples (or you've done replicates) then I'd probably leave them in. If these aren't from an outbreak, then something else might be going on. For example, maybe your core genome is tiny because theres a rogue sample of a different species in there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants