[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Append FASTA/FASTQ comment to SAM output [feature request] #136

Closed
sjackman opened this issue Mar 22, 2018 · 11 comments
Closed

Append FASTA/FASTQ comment to SAM output [feature request] #136

sjackman opened this issue Mar 22, 2018 · 11 comments

Comments

@sjackman
Copy link
sjackman commented Mar 22, 2018

bwa mem has the -C option to append the FASTA/FASTQ comment to SAM output. I use this feature with 10x Genomics Chromium linked reads to output the barcode BX:Z tag of each read. I'd find this feature useful for minimap2 -a -xsr.

@lh3
Copy link
Owner
lh3 commented Mar 23, 2018

Have been thinking about this for a while. As you mentioned, now added. Option -y.

Closing via 08bd212.

@lh3 lh3 closed this as completed Mar 23, 2018
@sjackman
Copy link
Author

Thanks for the quick response, Heng!

Here's a more involved idea… do you think minimap2 could chain through alignments of reads in the same barcode, to create an alignment of the full molecule? For CIGAR alignments, perhaps N could be used to indicate the gaps between reads.

@lh3
Copy link
Owner
lh3 commented Mar 26, 2018

That would amount to a 10x mapper, which is distinct from typical short-read mapping. The complication lies in that different genomic regions may be associated with the same barcode. You have to identify these "read cloud" first and then do mapping. Reads in these cloud have no specific order, either.

In general, the 10x mapper has room for improvement. One particular problem is that it uses bwa-mem, which is not good at all mappings. Minimap2 has the potential, but it will take significant efforts to implement. I think someone from the Bonnie Berger group has a better mapper. I thought it was on bioRxiv but could not find it.

@mcshane
Copy link
Contributor
mcshane commented Mar 26, 2018

@lh3
Copy link
Owner
lh3 commented Mar 26, 2018

Thanks, @mcshane. This is what I was talking about. The first author gave a talk at Broad. I think it is quite good, though I haven't used it personally.

@mcshane
Copy link
Contributor
mcshane commented Mar 26, 2018

Yeah, I haven't tried yet either, but it does look interesting.

@sjackman
Copy link
Author

Thanks for the pointer to EMerAld (EMA). I'm not familiar with this aligner. I'll check it out.

@sjackman
Copy link
Author
sjackman commented Mar 26, 2018

Back to minimap2 -y. Is the FASTQ comment copied to the SAM tag even when it's not formatted in a valid SAM tag format? In particular, when the comment is in Illumina bcl2fastq format (1:N:0:ATCACG), it'd be helpful not to copy it to the output and create an invalid SAM file.

I'm writing a pipeline where the FASTQ files may either be in bcl2fastq format or Longranger basic format, and it'd a nice bonus for the pipeline to be able to use minimap2 -y with both, and not have to detect which format the reads are in.

@lh3
Copy link
Owner
lh3 commented Mar 26, 2018

I am afraid that you have to do this on your side.

@sjackman
Copy link
Author

Okay. Thanks, Heng. If I were to implement this feature, would you accept a pull request?

@lh3
Copy link
Owner
lh3 commented Mar 26, 2018

Thanks. I will surely consider. Please don't introduce new dependencies, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants