-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SATagBuilder Api to manage SA:Z:* tags could be improved. #3324
Labels
Comments
A couple of months ago I made an attempt to factor SATagBuilder out to a
public place and change its API to be a little more friendly to different
downstream uses, but I found some unexpected behavior in how it was parsing
SA tags that made me give up for fear of breaking the tools that already
rely on it. Within the SV tools, I'm currently working on a branch in which
I've written some code to parse SA tags. Perhaps we can work out what we
both need from a shared API and implement that.
…On Fri, Jul 21, 2017 at 12:56 PM, Valentin Ruano Rubio < ***@***.***> wrote:
I have to deal with this component recently and I found the design rather
awkward.... In general between GATK and htsjdk we don't seem to have a
proper support for managing and querying Supplementary alignment
information from read alignment records:
Some of the things that I think smell:
1.
Querying: implemented in htsjdk consists in forging artificial
SAMRecords that contain only the alignment info in the SA tag element... It
seems to me that it makes more sense to create class to hold this
information alone (e.g. ReadAlignmentInfo or ReadAlignment); SATagBuilder
already has defined a private inner class with that in mind "SARead" so why
not flesh it out and make it public.
2.
Writing: currently SATagBuilder gets attached to a read, parsing its
current SA attribute content into SARead instances. It provides the
possibility adding additional SAM record one by one or clearing the list.
... then it actually updates the SA attribute on the original read when a
method (setTag) is explicitly called.
I don't see the need to attach the SATag Builder to a read... it could
perfectly be free standing; the same builder could be re-apply to several
reads for that matter and I don't see any gain in hiding the read SA tag
setting process,... even if typically this builder output would go to the
"SA" tag, perhaps at some point we would like to also write SA coordinate
list somewhere else, some other tag name or perhaps an error message... why
impose this single purpose limitation?
I suggest to drop the notion of a builder for a more general custom
ReadAlignmentInfo (or whatever name) list. Such list could be making
reference to a dictionary to validate its elements, prevent duplicates,
keep the primary SA in the first position... etc.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3324>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AArTZft11VTCtCHT_xr89kPL7hMFYQyhks5sQNghgaJpZM4Ofpkb>
.
|
I think @jamesemery was the original author, so he may be able to contribute explanations for why it's implemented the way it is. |
Sure @cwhelan. Can you be more concrete as to what was "wrong" with it? |
We can talk about it in person on Monday, perhaps is better, and post it here after. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have to deal with this component recently and I found the design rather awkward.... In general between GATK and htsjdk we don't seem to have a proper support for managing and querying Supplementary alignment information from read alignment records:
Querying: implemented in htsjdk consists in forging artificial SAMRecords that contain only the alignment info in the SA tag element... It seems to me that it makes more sense to create class to hold this information alone (e.g. ReadAlignmentInfo or ReadAlignment); SATagBuilder already has defined a private inner class with that in mind "SARead" so why not flesh it out and make it public.
Writing: currently SATagBuilder gets attached to a read, parsing its current SA attribute content into SARead instances. It provides the possibility adding additional SAM record one by one or clearing the list. ... then it actually updates the SA attribute on the original read when a method (setTag) is explicitly called.
I don't see the need to attach the SATag Builder to a read... it could perfectly be free standing; the same builder could be re-apply to several reads for that matter and I don't see any gain in hiding the read SA tag setting process,... even if typically this builder output would go to the "SA" tag, perhaps at some point we would like to also write SA coordinate list somewhere else, some other tag name or perhaps an error message... why impose this single purpose limitation?
I suggest to drop the notion of a builder for a more general custom ReadAlignmentInfo (or whatever name) list. Such list could be making reference to a dictionary to validate its elements, prevent duplicates, keep the primary SA in the first position... etc.
The text was updated successfully, but these errors were encountered: