[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ReadsHtsgetData source, refactor HtsgetReader #6662

Draft
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

andersleung
Copy link
Contributor
@andersleung andersleung commented Jun 16, 2020

Allow tools to read from htsget data sources.
@lbergelson

@andersleung andersleung marked this pull request as draft June 16, 2020 21:39
@lbergelson
Copy link
Member

I'm reviewing this right now. I'm sorry it's late.

Copy link
Member
@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this looks pretty good to start. I think there are some issues around many intervals being started in parallel and we may want to revisit that like we talked about on the call. Adding additional checks to ensure we're in coordinate order might make things simpler.

As far as I can tell the reader initialization is in parallel, which is good, but the actual reading will be in series because the merging iterator will poll one at a time. We might want to look at that later. We might also want to look into aggressively prefetching reads.

It obviously needs tests.

return new SAMRecordToReadIterator(samRecordIterator);
}

private Map<SamReader, CloseableIterator<SAMRecord>> getReaders(final boolean traverseUnmapped) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

traverseUnmapped is confusingly named here. The way I read it initially was that travereUnmapped mean "also include unmapped reads". I.e. traverseUnmapped == false implies NO unmapped unplaced reads and true means get all the reads including unplaced unmapped.

The flag actually seems to behave as "onlyIncludeUnplacedUnmappedReads". I think it should be renamed and a comment would be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this behaviour is confusing, but I'm trying to follow what ReadsPathDataSource does which is more complicated.

If there are intervals and traverseUnmapped == true it returns the intervals and the unplaced unmapped reads
If there are no intervals and traverseUnmapped == true it returns only the unplaced unmapped reads.

@lbergelson lbergelson assigned andersleung and unassigned lbergelson Jul 1, 2020
@andersleung
Copy link
Contributor Author

We should be able to add tests quite easily once we can get the reference server into gatk, so merging this will have to wait until then.

@andersleung andersleung removed their assignment Jul 6, 2020
Copy link
Member
@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are my comments so far.

@@ -0,0 +1,10 @@
WORKING_DIR=/home/travis/build/broadinstitute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment to this script explaining what it does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed file and added a README to the script folder

@lbergelson lbergelson added the GA4GH Collaboration with GA4GH label Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GA4GH Collaboration with GA4GH
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants