[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blastx error writing file #8

Open
agrier04 opened this issue Apr 30, 2015 · 5 comments
Open

blastx error writing file #8

agrier04 opened this issue Apr 30, 2015 · 5 comments

Comments

@agrier04
Copy link

I find that with a query of more than ~30 reads I get the following error:

Error: 20File_write_exception: Error writing file out.daa (696/5)

With fewer than ~30 reads, no error occurs. Furthermore, I say ~30 because sometimes it's more and sometimes it's less (I tried adding one read at a time until I got the error) depending on what the reads actually are, even if they're all the same length...sometimes inclusion of the 25th read will induce the error, sometimes the 35th. I've ruled out that it is a specific read causing the problem, i.e. if 25 reads gives an error, using just the 25th read plus a dozen other random ones don't give the error.

No amount of available memory or number of threads has any effect, nor does using or not using --tmpdir.

Below is an example of the verbose output:

diamond blastx -d uniref50_DEMO.dmnd -q in.fa -a out -p 1 -t tmp -v
diamond v0.7.9.58

Threads = 1

Scoring matrix = blosum62
Lambda = 0.267
K = 0.041
Gap open penalty = 11
Gap extension penalty = 1
Seg masking = 1
SSSE3 enabled.
Opening the database... [0.1s]
Reference = uniref50_DEMO.dmnd
Sequences = 327
Letters = 124409
Block size = 2000000000
Opening the input file... [0.0s]
Opening the output file... [0.0s]
Loading query sequences... [0.0s]
Sequences = 300, letters = 14900
Running complexity filter... [0.0s]
Building query histograms... [0.0s]
Allocating buffers... [0.0s]
Loading reference sequences... [0.2s]
Allocating buffers... [0.0s]
Initializing temporary storage... [0.0s]
Processing query chunk 0, reference chunk 0, shape 0, index chunk 0.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 0, index chunk 1.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 0, index chunk 2.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 0, index chunk 3.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 1, index chunk 0.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 1, index chunk 1.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 1, index chunk 2.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 1, index chunk 3.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 2, index chunk 0.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.6s]
Processing query chunk 0, reference chunk 0, shape 2, index chunk 1.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 2, index chunk 2.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.6s]
Processing query chunk 0, reference chunk 0, shape 2, index chunk 3.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 3, index chunk 0.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 3, index chunk 1.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 3, index chunk 2.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 3, index chunk 3.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.6s]
Closing temporary storage... [0.0s]
Deallocating buffers... [0.0s]
Computing alignments... [0.0s]
Error: 20File_write_exception: Error writing file out.daa (696/5)

The daa file is created, but it is empty.

@bbuchfink
Copy link
Owner

This usually means there's no disk Space, did you check that?

Am Donnerstag, 30. April 2015 schrieb agrier04 :

I find that with a query of more than ~30 reads I get the following error:

Error: 20File_write_exception: Error writing file out.daa (696/5)

With fewer than ~30 reads, no error occurs. Furthermore, I say ~30 because
sometimes it's more and sometimes it's less (I tried adding one read at a
time until I got the error) depending on what the reads actually are, even
if they're all the same length...sometimes inclusion of the 25th read will
induce the error, sometimes the 35th. I've ruled out that it is a specific
read causing the problem, i.e. if 25 reads gives an error, using just the
25th read plus a dozen other random ones don't give the error.

No amount of available memory or number of threads has any effect, nor
does using or not using --tmpdir.

Below is an example of the verbose output:

diamond blastx -d uniref50_DEMO.dmnd -q in.fa -a out -p 1 -t tmp -v
diamond v0.7.9.58
#Threads = 1
Scoring matrix = blosum62
Lambda = 0.267
K = 0.041
Gap open penalty = 11
Gap extension penalty = 1
Seg masking = 1
SSSE3 enabled.
Opening the database... [0.1s]
Reference = uniref50_DEMO.dmnd
Sequences = 327
Letters = 124409
Block size = 2000000000
Opening the input file... [0.0s]
Opening the output file... [0.0s]
Loading query sequences... [0.0s]
Sequences = 300, letters = 14900
Running complexity filter... [0.0s]
Building query histograms... [0.0s]
Allocating buffers... [0.0s]
Loading reference sequences... [0.2s]
Allocating buffers... [0.0s]
Initializing temporary storage... [0.0s]
Processing query chunk 0, reference chunk 0, shape 0, index chunk 0.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 0, index chunk 1.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 0, index chunk 2.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 0, index chunk 3.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 1, index chunk 0.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 1, index chunk 1.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 1, index chunk 2.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 1, index chunk 3.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 2, index chunk 0.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.6s]
Processing query chunk 0, reference chunk 0, shape 2, index chunk 1.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 2, index chunk 2.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.6s]
Processing query chunk 0, reference chunk 0, shape 2, index chunk 3.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 3, index chunk 0.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 3, index chunk 1.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 3, index chunk 2.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.5s]
Processing query chunk 0, reference chunk 0, shape 3, index chunk 3.
Building reference index... [0.0s]
Building query index... [0.0s]
Searching alignments... [1.6s]
Closing temporary storage... [0.0s]
Deallocating buffers... [0.0s]
Computing alignments... [0.0s]
Error: 20File_write_exception: Error writing file out.daa (696/5)

The daa file is created, but it is empty.


Reply to this email directly or view it on GitHub
#8.

@agrier04
Copy link
Author
agrier04 commented May 3, 2015

It's running on a cluster. I have more than 4TB of disk space available, and I've tried running it with anywhere from 32-256GB of RAM. I've also tried running it in a configuration where it is executed on the same node that the binary, input/output files, and tmpdir are stored on, which also has plenty of disk space and ~32GB of RAM. As confirmation that disk space is not the issue, I have written several large files without a problem in the time since I started trying to use diamond and have been getting this error.

I've also tried both configurations without the tmpdir option, just to see if it makes a difference. It's the same error under all circumstances. This is true even for 50-100 150bp reads, and an 8Mb dmnd database file, which I would expect to use a rather trivial quantity of memory and disk space.

I have noticed that I can get it to run with a few more reads (say, 35 instead of 30) if I set the --index-chunks to a high value (e.g. 12) and --max-target-seqs to 1 (or some other small number).

I'm fairly certain that the determining factor is the size of the results. If the results are larger than some very small value, they cannot be written and I get this error.

@agrier04
Copy link
Author
agrier04 commented May 3, 2015

I should I add that I recently discovered that I get the exact same error message when I try to generate a dmnd database file with makedb:

diamond v0.7.9.58
#Threads = 24
Scoring matrix = blosum62
Lambda = 0.267
K = 0.041
Gap open penalty = 11
Gap extension penalty = 1
Seg masking = 0
SSSE3 enabled.
Database file = uniref50.fasta
Opening the database file... [0.1s]
Loading sequences... [2.1s]
Sequences = 310967, letters = 100000267
Building histograms... [2.6s]
Saving to disk... [0.0s]
Error: Error writing file uniref50.dmnd (2487744/4040)

The queries I have been able to run have been with a dmnd database which I downloaded. (Made available for use with HUMAnN2: http://huttenhower.sph.harvard.edu/humann2_data/uniprot/uniref50_GO_filtered/)

@agrier04
Copy link
Author
agrier04 commented May 4, 2015

Problem solved. The issue is something to do with architecture of the system I am using, which is not my area of expertise and I don't fully understand the problem.

I work on a cluster which has an interface/head node, which is where I installed diamond and where I keep the query and database files. Connected to this head node, there are compute nodes which do the actual work; they're where the program runs when I submit a job. I tried both running diamond on the head node, as well as submitting diamond jobs to the compute node. Both of these tactics produce the error.

It turns out that the way to get it work is to submit a diamond job and copy both the query and database files to the local disk of the compute node, have the resulting daa file written there, then copy the results back to the head node. This works with no problem.

Interestingly, even diamond view cannot write an m8 file, given an daa file, if I try to write the m8 directly to the local disk of the head node. I have to include a view command at the end of the job to write an m8 file on the local disk of the compute node, then copy it back to the head node.

I have not observed this behavior with any other software, and I frequently use many different high performance bioinformatics programs. If you're curious, I would be happy to answer any questions you may have about the system architecture. In any case, this issue and my other issue, #9: "Input file is a DAA file (possibly related to issue - blastx error writing file)", are completely resolved.

Thank you for trying to help me out.

@ryandward
Copy link

Interesting. I am having the same issue. Diamond does not work on files stored on USB drives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants