You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The output produced a core gene alignment (attached as core_gene_alignment.aln.zip). I ran raxml on it and it produced the following warnings:
IMPORTANT WARNING: Sequences prokka_JNJ_Fna1_03S_S2 and prokka_JNJ_Fna_combined7_1 are exactly identical
IMPORTANT WARNING: Sequences prokka_JNJ_Fna1_05B_S7 and prokka_JNJ_Fna1_26A_S7 are exactly identical
IMPORTANT WARNING: Sequences prokka_JNJ_Fnp3_06B_S11 and prokka_JNJ_Fnp4_06S_S8 are exactly identical
I checked the alignment file and, indeed, the sequences are identical. I then checked the corresponding gff input files and they are NOT identical. I'm stumped. It seems that roary (or, rather, some of the tools used by roary) seems to have invented these phantom duplicates. I googled the error message
IMPORTANT WARNING: Sequences are exactly identical
and it seems there are dozens of pages containing it. So it is not just me who faces the issue. Can you, please, have a look?
The text was updated successfully, but these errors were encountered:
Hi Nikolay,
I had a look and there appears to be multiple species in your input dataset, and theres only a tiny number of core genes in common with the default 95% blast identity. This explains the odd results. You could try splitting your dataset into different species or lowering the % identity (-i). One thing I did note is that sample prokka_JNJ_Fna7_33S_S9 has double the number of proteins compared to the rest of your dataset, so you may wish to investigate this sample further.
Andrew
I have used 24 strains (the gff files are attached as roary_input.zip) and ran roary as
roary -e -n -v -f roary_output10 roary_input/*.gff
The output produced a core gene alignment (attached as core_gene_alignment.aln.zip). I ran raxml on it and it produced the following warnings:
I checked the alignment file and, indeed, the sequences are identical. I then checked the corresponding gff input files and they are NOT identical. I'm stumped. It seems that roary (or, rather, some of the tools used by roary) seems to have invented these phantom duplicates. I googled the error message
and it seems there are dozens of pages containing it. So it is not just me who faces the issue. Can you, please, have a look?
The text was updated successfully, but these errors were encountered: