The difference of result between diamond1 and diamond2 ? #515

wanjinhu · 2021-09-22T03:08:23Z

Hi there:

In order to compare the difference between diamond1 and 2, I built DIAMOND database from the nr database separately. And use diamond1 and 2 for sequence alignment with the same set of nucleic acid sequences, the parameters are the same.

From the results, I found two main differences:

The subject sequence obtained from diamond1 has a version number, such as: XP_028630034.1; while the subject sequence obtained from diamond2 does not have a version number, such as: XP_028630034;
The result obtained by diamond2, the top one subject sequence is always from the UniProtKB/Swiss-Prot, while the result obtained by diamond1 is always the result of the NCBI Ref database.

diamond1 result:

Query Subject
ENSMUST00000001513	NP_080749.2
ENSMUST00000001513	XP_028630034.1
ENSMUST00000001513	NP_001233720.1
ENSMUST00000001513	XP_034374267.1
ENSMUST00000001513	NP_001020846.1
ENSMUST00000001513	XP_006970129.1
ENSMUST00000001513	XP_021482884.1
ENSMUST00000001513	XP_005074701.1
ENSMUST00000001513	OBS76548.1
ENSMUST00000001513	XP_004656888.1

diamond2 result:

Query Subject
ENSMUST00000001513	Q922F4
ENSMUST00000001513	XP_028630034
ENSMUST00000001513	AAZ14959
ENSMUST00000001513	XP_034374267
ENSMUST00000001513	AAH97977
ENSMUST00000001513	XP_006970129
ENSMUST00000001513	XP_021482884
ENSMUST00000001513	XP_005074701
ENSMUST00000001513	OBS76548
ENSMUST00000001513	XP_004656888

I read the paper "Sensitive protein alignments at tree-of-life scale using DIAMOND", it is mentioned that the benchmark database uses UniRef50 database information. I'm not sure if the second question I just mentioned is related to this?

Regarding these two questions, I hope to get your answers. thank you very much

wanjin hu

The text was updated successfully, but these errors were encountered:

bbuchfink · 2021-09-28T13:40:26Z

Did you use a BLAST database when running diamond v2? That would explain the different accessions. Note that for example NP_080749.2 and Q922F4 are the same proteins.

wanjinhu · 2021-10-09T05:28:15Z

I found the reason, although I don't know why.

When the parameter --salltitles is added to diamond2, the result is as follows,

Query Subject
ENSMUST00000001513	Q922F4
ENSMUST00000001513	XP_028630034
ENSMUST00000001513	AAZ14959
ENSMUST00000001513	XP_034374267
ENSMUST00000001513	AAH97977

When the parameter --salltitles is not added to diamond2, the result is as follows, the result is same as the diamond1 result.

Query Subject
ENSMUST00000001513	NP_080749.2
ENSMUST00000001513	XP_028630034.1
ENSMUST00000001513	NP_001233720.1
ENSMUST00000001513	XP_034374267.1
ENSMUST00000001513	NP_001020846.1

Also, the parameter --salltitles is not work for diamond1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The difference of result between diamond1 and diamond2 ? #515

The difference of result between diamond1 and diamond2 ? #515

The difference of result between diamond1 and diamond2 ? #515

The difference of result between diamond1 and diamond2 ? #515

Comments