-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corrupted tabix index #393
Comments
Can you try with the latest master version, @dariober? I fixed some bugs in the Tabix/Tribble indexing recently... |
Hi - Thanks for looking into this and apologies I kept quiet. I tested the code above on htsjdk-2.7.0 an the problem persists. Also, after indexing the following file (call it chrom.bed.gz):
The query
skips the first record and returns:
However, this file is correctly queried:
It seems to me there is something going on when features are in the first bin 0-16384. (Just to make sure... Can anybody reproduce this bug or it's just me?) |
This is also happening to me, @dariober. I will create a PR with a failing index to point out that this is happening and ask how can be fixed. |
@dariober There are a couple of problems with the IndexFactory code paths that create indices based on an input file. The worst one is that although the indexer recognizes block compressed inputs, it wraps a PositionalBufferedStream around the BlockCompressedInputStream used to decode them, so instead of handing the indexer virtual file pointers, it hands it byte offsets. In addition, the first feature added to the index is always indexed as if it were at offset 0 in the input, even if the input file has a header. In addition, there is a bug in the calculation of the linear index part of the tabix index that manifests in the case where there is a feature that is located at offset 0 in the input file (which can happen with BED). Fixes are forthcoming. |
Hello,
It seems to me that tabix indexes created with
writeBasedOnFeatureFile
are corrupted. Am I doing something wrong?Here's a bed test file, it has 1M rows on 10 chroms. I bgzipped with tabix 1.2.1:
https://dl.dropboxusercontent.com/u/53487723/tmp.bedGraph.gz
I created an index for it with the following program (java 8, htsjdk-1.141):
Querying the resulting index returns only the first records of chr1:
The text was updated successfully, but these errors were encountered: