not working without parameter selction #41

kmoosi · 2021-06-16T14:56:45Z

Hi,

I don't have paired end reads but as described in previous issues I have copied my input fastq, removed the umi's (16 N long) and used it as input for the second file as shown in the screenshot. I've gotten an error message (no error or minimizer parameters passed. Selecting parameters based on barcode and inferred read length
Inferred read length 55 from sample of 10000 reads). Then I've tried to use the example command and only adjusted my input file names and the barcode length and my outfile (cluster) had been generated. But I'm not sure if this is the right parameter selection for my sequences - they are very short - only 55 bases already including 16 bases umi.

But I've tried further if I can use the generated cluster file for calib_cons. No error message here, but empty files. So my question here is, does the described example command refer to the same input files as in the first calib command for clustering or is this another fastq file, different from the input.

To run Calib error correction, run:

calib_cons -c <cluster_file> -q <space_separated_FASTQ_list> -o <space_separated_output_prefix_list>

For example:

calib_cons -c R.cluster -q R1.fastq R2.fastq -o R1. R2.

Thanks in advance and sorry for the probably dumb questions for experts, but I'm new in this topic (:

baraaorabi · 2021-06-16T17:03:40Z

I don't think your screenshot is attached. Can you add it again?

Let me make sure I got this correct. You have R1 that has no barcode and is 39bp long and R2 which has the barcode (16bp) and the rest of R2 is identical to R1?

kmoosi · 2021-06-17T08:35:20Z

It's the other way round. I have R1 with 16bp long barcodes followed by 39 bp sequence and R2 with sequence (39bp) only. And yes, there is only the difference of the missing barcodes in the sequences of R2.

baraaorabi · 2021-06-17T16:42:27Z

I see. The reason why this is failing, is because Calib default parameter sets have been tested for read length between 60 and 250bp. So you will have to select the parameters yourself. I suggest to start with -e 1 -k 4 -m 5 -t 3. Maybe consider increasing -e to 2 instead.

kmoosi · 2021-06-18T14:41:01Z

Thank you for the quick answer. It's working now - I got my cluster file and tried to do the calib_cons command (screenshot) but all I've got are empty files. Did I choose the wrong input fastq since I've chosen the same as in the calib command?

baraaorabi · 2021-06-18T14:44:04Z

Can you check the cluster file for how many clusters did Calib generate?

…

On Fri., Jun. 18, 2021, 7:41 a.m. kmoosi, ***@***.***> wrote: [image: calib error2] <https://user-images.githubusercontent.com/84722946/122577843-1a084f00-d008-11eb-9c4f-ddc9ada6cf9a.PNG> Thank you for the quick answer. It's working now - I got my cluster file and tried to do the calib_cons command (screenshot) but all I've got are empty files. Did I choose the wrong input fastq since I've chosen the same as in the calib command? — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#41 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABP6AOMWH4HEUIWFFIENSYTTTNLI7ANCNFSM46ZUHSQA> .

kmoosi · 2021-06-18T14:51:21Z

is it the first number in the first column? then it will be 94
but I have to say for this first test I only have used a file containing only about the first 101 sequences of my whole ngs data. so maybe the input size and/or variety is to small?

baraaorabi · 2021-06-18T14:54:23Z

The consensus stage expects each cluster to be at least of size 2 and at most of size 1000 (configurable parameters). It's probably the reason why you don't have any output.

…

On Fri., Jun. 18, 2021, 7:51 a.m. kmoosi, ***@***.***> wrote: [image: calib3] <https://user-images.githubusercontent.com/84722946/122579242-92bbdb00-d009-11eb-89f5-9a6a0b7c4862.PNG> is it the first number in the first column? then it will be 94 but I have to say for this first test I only have used a file containing only about the first 101 sequences of my whole ngs data. so maybe the input size and/or variety is to small? — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#41 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABP6AOJTZ2ZZOW25YKF4U3LTTNMPXANCNFSM46ZUHSQA> .

kmoosi · 2021-06-20T07:58:29Z

Ok, I've tried it with my complete data set and now it seems to be working. Thank you very much!
I have only one more question for my understanding. I get a fastq and a msa file as an output. The fastq is a list of my consensus reads right? But what's the meaning of the first line of an entry/first two lines of the first entry, especially the number after the @?
Is after this number in the ID line a list of the entries which belong to the consenus?

And the msa file just lists the consensus generation in detail with all the belonging aligned reads right?

baraaorabi · 2021-06-21T14:55:20Z

The number is just a new read name. The semicolon separated list of numbers is the ID's of the reads making up this consensus read cluster. Yeah, the MSA files are just the multiple sequence alignment files used for computing the consensus sequences.

…

On Sun, Jun 20, 2021 at 12:58 AM kmoosi ***@***.***> wrote: Ok, I've tried it with my complete data set and now it seems to be working. Thank you very much! I have only one more question for my understanding. I get a fastq and a msa file as an output. The fastq is a list of my consensus reads right? But what's the meaning of the first line of an entry/first two lines of the first entry, especially the number after the @? Is after this number in the ID line a list of the entries which belong to the consenus? [image: calib4] <https://user-images.githubusercontent.com/84722946/122666508-8dbe6f00-d162-11eb-8df8-d949b8acdaae.PNG> And the msa file just lists the consensus generation in detail with all the belonging aligned reads right? — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#41 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABP6AOMMQMYVMZVTIB3OIDLTTWNTBANCNFSM46ZUHSQA> .

-- Baraa Orabi PhD Student Vancouver Prostate Centre

kmoosi · 2021-07-01T15:05:13Z

OK, thanks for explaining. and as far as I understood the second file (R2 - consisting of the reads without UMI's) is processed in the same manner as R1 although no UMI's are there?

baraaorabi · 2021-07-19T17:53:06Z

Yep, exactly (sorry for late reply, was on vacation)

fhach assigned baraaorabi Jun 16, 2021

baraaorabi closed this as completed Jul 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

not working without parameter selction #41

not working without parameter selction #41

not working without parameter selction #41

not working without parameter selction #41

Comments