[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The alignment file contains identical sequences of unknown origin #286

Closed
nikolay12 opened this issue Oct 19, 2016 · 1 comment
Closed
Labels

Comments

@nikolay12
Copy link

I have used 24 strains (the gff files are attached as roary_input.zip) and ran roary as

roary -e -n -v -f roary_output10 roary_input/*.gff

The output produced a core gene alignment (attached as core_gene_alignment.aln.zip). I ran raxml on it and it produced the following warnings:

IMPORTANT WARNING: Sequences prokka_JNJ_Fna1_03S_S2 and prokka_JNJ_Fna_combined7_1 are exactly identical
IMPORTANT WARNING: Sequences prokka_JNJ_Fna1_05B_S7 and prokka_JNJ_Fna1_26A_S7 are exactly identical
IMPORTANT WARNING: Sequences prokka_JNJ_Fnp3_06B_S11 and prokka_JNJ_Fnp4_06S_S8 are exactly identical

I checked the alignment file and, indeed, the sequences are identical. I then checked the corresponding gff input files and they are NOT identical. I'm stumped. It seems that roary (or, rather, some of the tools used by roary) seems to have invented these phantom duplicates. I googled the error message

IMPORTANT WARNING: Sequences are exactly identical

and it seems there are dozens of pages containing it. So it is not just me who faces the issue. Can you, please, have a look?

@andrewjpage
Copy link
Member

Hi Nikolay,
I had a look and there appears to be multiple species in your input dataset, and theres only a tiny number of core genes in common with the default 95% blast identity. This explains the odd results. You could try splitting your dataset into different species or lowering the % identity (-i). One thing I did note is that sample prokka_JNJ_Fna7_33S_S9 has double the number of proteins compared to the rest of your dataset, so you may wish to investigate this sample further.
Andrew

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants