[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The difference of result between diamond1 and diamond2 ? #515

Open
wanjinhu opened this issue Sep 22, 2021 · 2 comments
Open

The difference of result between diamond1 and diamond2 ? #515

wanjinhu opened this issue Sep 22, 2021 · 2 comments

Comments

@wanjinhu
Copy link

Hi there:

In order to compare the difference between diamond1 and 2, I built DIAMOND database from the nr database separately. And use diamond1 and 2 for sequence alignment with the same set of nucleic acid sequences, the parameters are the same.

From the results, I found two main differences:

  1. The subject sequence obtained from diamond1 has a version number, such as: XP_028630034.1; while the subject sequence obtained from diamond2 does not have a version number, such as: XP_028630034;

  2. The result obtained by diamond2, the top one subject sequence is always from the UniProtKB/Swiss-Prot, while the result obtained by diamond1 is always the result of the NCBI Ref database.

diamond1 result:

Query Subject
ENSMUST00000001513	NP_080749.2
ENSMUST00000001513	XP_028630034.1
ENSMUST00000001513	NP_001233720.1
ENSMUST00000001513	XP_034374267.1
ENSMUST00000001513	NP_001020846.1
ENSMUST00000001513	XP_006970129.1
ENSMUST00000001513	XP_021482884.1
ENSMUST00000001513	XP_005074701.1
ENSMUST00000001513	OBS76548.1
ENSMUST00000001513	XP_004656888.1

diamond2 result:

Query Subject
ENSMUST00000001513	Q922F4
ENSMUST00000001513	XP_028630034
ENSMUST00000001513	AAZ14959
ENSMUST00000001513	XP_034374267
ENSMUST00000001513	AAH97977
ENSMUST00000001513	XP_006970129
ENSMUST00000001513	XP_021482884
ENSMUST00000001513	XP_005074701
ENSMUST00000001513	OBS76548
ENSMUST00000001513	XP_004656888

I read the paper "Sensitive protein alignments at tree-of-life scale using DIAMOND", it is mentioned that the benchmark database uses UniRef50 database information. I'm not sure if the second question I just mentioned is related to this?

Regarding these two questions, I hope to get your answers. thank you very much

wanjin hu

@bbuchfink
Copy link
Owner

Did you use a BLAST database when running diamond v2? That would explain the different accessions. Note that for example NP_080749.2 and Q922F4 are the same proteins.

@wanjinhu
Copy link
Author
wanjinhu commented Oct 9, 2021

I found the reason, although I don't know why.

When the parameter --salltitles is added to diamond2, the result is as follows,

Query Subject
ENSMUST00000001513	Q922F4
ENSMUST00000001513	XP_028630034
ENSMUST00000001513	AAZ14959
ENSMUST00000001513	XP_034374267
ENSMUST00000001513	AAH97977

When the parameter --salltitles is not added to diamond2, the result is as follows, the result is same as the diamond1 result.

Query Subject
ENSMUST00000001513	NP_080749.2
ENSMUST00000001513	XP_028630034.1
ENSMUST00000001513	NP_001233720.1
ENSMUST00000001513	XP_034374267.1
ENSMUST00000001513	NP_001020846.1

Also, the parameter --salltitles is not work for diamond1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants