[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calib and use with conda #28

Closed
ChadFibke opened this issue Mar 23, 2020 · 13 comments
Closed

calib and use with conda #28

ChadFibke opened this issue Mar 23, 2020 · 13 comments

Comments

@ChadFibke
Copy link

Hey @baraaorabi,

I was wondering how the consensus and error correction steps are performed with the conda installed version of calib?

I was able to generate the test.cluster with the following command:

calib --input-forward R1.fastq.gz --input-reverse R2.fastq.gz --barcode-length 4 --output-prefix test. --minimizer-count 7 --kmer-size 8 --error-tolerance 1 --minimizer-threshold 2

BUT, I'm unable to proceed with the clustering and error correction steps because there are no additional calib arguments with the conda installed version:

$ calib --help
Combined barcode lengths must be a positive integer and each mate barcode length must be non-negative! Note if both mates have the same barcode length you can use -l/--barcode-length parameter instead.
Calib: Clustering without alignment using LSH and MinHashing of barcoded reads
Usage: calib [--PARAMETER VALUE]
Example: calib -f R1.fastq -r R2.fastq -o my_out. -e 1 -l 8 -m 5 -t 2 -k 4 --silent
Calib's paramters arguments:
-f --input-forward (type: string; REQUIRED paramter)
-r --input-reverse (type: string; REQUIRED paramter)
-o --output-prefix (type: string; REQUIRED paramter)
-s --silent (type: no value; default: unset)
-q --no-sort (type: no value; default: unset)
-g --gzip-input (type: no value; default: unset)
-l --barcode-length (type: int; REQUIRED paramter unless -l1 and -l2 are provided)
-l1 --barcode-length-1 (type: int; REQUIRED paramter unless -l is provided)
-l2 --barcode-length-2 (type: int; REQUIRED paramter unless -l is provided)
-p --ignored-sequence-prefix-length (type: int; default: 0)
-m --minimizer-count (type: int; default: Depends on observed read length;)
-k --kmer-size (type: int; default: Depends on observed read length;)
-e --error-tolerance (type: int; default: Depends on observed read length;)
-t --minimizer-threshold (type: int; default: Depends on observed read length;)
-c --threads (type: int; default: 1)
-h --help

Am I missing something here?

Best,
Chad

@baraaorabi
Copy link
Collaborator

Hello Chad,

Sorry for the (very) late response.

So, the conda version does not install the error correction module and only contains the clustering one because of some issues I had with adding SPOA dependency on bioconda. Please let me know if using conda is a must for your tests and I can give it another try over the weekend especially that SPOA conda version has been updated recently

@ChadFibke
Copy link
Author

Hey Baraa,

No problem! Conda would be preferable, and I'm sure many more would appreciate it! However, if that is not the case I'm happy to install it using your instructions on the readme. I eventually installed calib following the readme and was able to configure calib and calib_cons. However, after running :
calib --input-forward test_R1.fastq.gz --input-reverse test_R2.fastq.gz --barcode-length 4 --output-prefix test --minimizer-count 7 --kmer-size 8 --error-tolerance 1 --minimizer-threshold 2

Extracting minimizers and barcodes...
Memory before reading FASTQ:
1MB
Memory right after reading FASTQ:
399MB
Memory after reserving for read_to_node_vector & node_to_minimizers:
399MB
Memory after filling barcode_to_node_map:
597MB
Memory after releasing node_to_read_map:
584MB
Memory after reserving barcode_to_nodes_vector:
584MB
Memory after filling barcodes & barcode_to_nodes_vector:
676MB
Memory after releasing barcode_to_node_map:
663MB
Read count: 1795912
Node count: 1795912
Barcode count: 1737251
Memory after exiting extract_barcodes_and_minimizers():
663MB
Clustering...
Adding edges due to barcode barcode similarity
Number of masks is 8
01111111 is assigned to thread 0
Thread 0 built LSH in: 0
Thread 0 processed LSH in: 0
10111111 is assigned to thread 0
Thread 0 built LSH in: 0
Thread 0 processed LSH in: 0
11011111 is assigned to thread 0
Thread 0 built LSH in: 0
Thread 0 processed LSH in: 0
11101111 is assigned to thread 0
Thread 0 built LSH in: 0
Thread 0 processed LSH in: 0
11110111 is assigned to thread 0
Thread 0 built LSH in: 0
Thread 0 processed LSH in: 0
11111011 is assigned to thread 0
Thread 0 built LSH in: 0
Thread 0 processed LSH in: 0
11111101 is assigned to thread 0
Thread 0 built LSH in: 0
Thread 0 processed LSH in: 0
11111110 is assigned to thread 0
Thread 0 built LSH in: 0
Thread 0 processed LSH in: 0
On thread 0 building all LSH took: 0
On thread 0 processing all LSH took: 0
On thread 0 merging local graph with global graph
On thread 0 merging took 0
Building the graph on 1 thread(s) took 0
Adding edges between nodes of identical barcodes with thread 0
Adding edges due to barcodes similarity took: 213
Memory after adding edges:
433MB
Extracting clusters
Extracting clusters took: 0
Memory extracting clusters:
433MB
Memory after releasing graph:
392MB
Outputting clusters
min_records_per_tmp_file 180224
There are 1795912 clusters
There are 10 temp files
There are 10 temp files
Processing file testtemp_0
Processing file testtemp_1
Processing file testtemp_2
Processing file testtemp_3
Processing file testtemp_4
Processing file testtemp_5
Processing file testtemp_6
Processing file testtemp_7
Processing file testtemp_8
Processing file testtemp_9
Outputting clusters took: 24
All done! Have a good day!

I received binary testcluster file, which was ~2.7G and started with a 863640 863640 17 code at the beginning of the file instead of the expected tsv file. I then ran:

calib_cons -t 8 -c testcluster -q calib_1.fastq calib_2.fastq -o 1.out 2.out

Reading cluster file: testcluster
Reading fastq file: calib_1.fastq
Writing output files: 1.out
Reading fastq file: calib_2.fastq
Writing output files: 2.out

which results in empty msa and fastq files (I think resulting from the improper cluster file). Please let me know if you would like me to transfer this additional issue to a separate issue!

@baraaorabi
Copy link
Collaborator

The .cluster file outputted is a regular tab delimited file. Can you show me the result of head testcluster command? calib_cons generates consensus only for clusters of sizes >= --min-reads-per-cluster

Also, what is the length of each read mate?

@ChadFibke
Copy link
Author
ChadFibke commented Apr 25, 2020

head testcluster

863640 863640 17 ??6?ޠp?lÂóLN????T???@A?aJ,?}?????z?,?f;?>v?m;ǻ%p?ɔ????@???kv?8B
?!?ZWR:iZ???DDS
-?e?????b?K??V????.&+??}?kc?^5?H?$??!mHm$ة$???4;?EX?X?? ?4?$B?ё?J?P 46?[????6l?η0???6? ??3r̆W5?????.4Dz?O?ĺ\?৺?f_??A%s???@???°?oT?ol?/؆ ???(DK?b??GBL-???(!?/?8?.95?8??~???_??h-n???熜?.?Vt7,\???Դ?????-?١?%?d3w?AdR???z???�?3?(?eb??Z:?1?T??M? &?????~??@?($}??8"s??Q)?L??u???\?ilL???G????&Z??P!?g?8o_ʔlz?U?O???.?j?leS?U? ???+ke Ɂ??Qr?l?^?????TMm???4v?)?o?S\,SR?L^?l??=???????庬nWv8????? 遥??+?l??Q?? ??@^?9 ??G.9e???,?#???/??.???0?X???$`` n???BAVEr(>$G}$?@???X[???>?uD????n?;IR?u??z??{?Ĭ"8?!?X?n?+\ꁹ?L?'- B???-?p??^??@?$????^r??,Q?9Rtjk;?[Aў?gh.1i??\E!?'???s??y????_h&?fMk???8/?&?{?v????? ??g?3 ??4'?L?'_f??V?8a??[????a[?:(?G??TQ"l_?mL??g-4??f??*???^??l]HB?c?c7???E3O?b?? ??x~?J?i.??5_??^'?3o?z??a?
??;Ѭ?x>!?⢂?????-l?U?p_?????N??$G?+P????nk?
??@>Lހ?䖃??B0T()?4?ղ؄?'?Q?7IH;?Saɏ???EgA??>??ˌ?
q?? $R?? #Q?ƫ.7??qY?????G?|?E??>;7???v?`???
?( ?D
,???fQ????????z3??]?????b??B?obtl?y?$6?c????H?????R?rUA?><D?47?y?!???3?>X????C?D?)V??;h??s?&QKC?$>z??$/?3m???@]?T?ġ?wM?߆X?!k@ݖ?]r??.???z?˴???`+???uF???q?I???,?MG
?³?hqd??? ?1?y????r??9????1?5#????uQ??|????W~?|??K????>? KLci?m3'?l?@U???"X?@4???s?ب??????=ٮi??v?>?S?q??J?????g@T???m ެ?z(??Z.;?? 8ճZ2F?H????n?]OG ??lN?|U????-`9??*???????m/?B????|1?.j??h6??H??@????N?p\??????????c?^?J??Zo$??y 650030 650030 41 ?le*???????[H???u5???Ӫ?0??~n?a?bR?@nf'`????r??H??LB???????????DW? ?????m??J???װ?gyX4W???+;?????v?Äp?!??0*?B?
%??v???ぬ?"Z???b?\?3(@?VPfr=??G
Y??Zu?2?u3??
?6?J??{E+6|]????+ ?5PR?? R? u??V???????v?fN?Mƿj?.?S??&0m_? ?(???x?%?????M?"???!?#???I??T???/?ȭ?y?m?ϯf??Dv??Q??,nwRsΝ?uڟ&?WPr?uo?t?xeM??A)w?2?Ey_?BDZ_OR?8??h>G&B0u?zP???_?Tz??8w#-?????-??t??????JOWI?Sr⚆?c??0Iw?s? /Y???2?ϕx??b?J?_@?׺?c?p??X?=??hP????~????????o֮y?{?L?~+З?:&????E?^??**H??,גpy?W??#???F???? ?H3???ZY?jb!b?VV??R?ī?~??9??vn?8?ڶ??iZ??ޟY?=1??;??=ƻ?Cs8?????;??T"tma? ????B?f!?????5?DV??@3xdd?4qE?=??j/??o??ܜ??Ҝ??t:???x?9ޗb?؞N1N?h@ej?;?? _r?)`???»&+???q???\?%?cQIA?93x䦈? j??N?ɪ?K?}?ͥ?|???B['??H?U>????9_Ί6"?Hg?h??UZ???4??|L?+?ՀՄ??????LĈ"?c?3?!0ye6,?B?]^?c??D= ?M???8?v?i???V!`??Ga?5,3&@{b?>????C?Ȯ??/????0???^KZ?1F??L?v?]?,=?????`?G?
?T?m?J????a1?D'h?y,w??o1`?]?Sw???K8?v??]??J?????O??? D??????d`?Y???k?`*kn?1?xg?^?t
=??VO????K????#?LW
?8??3Is8??ߔ?x>??J??/?????I?kW??^z??!3???%?vHmʏ??%3?&???y&~ 0??9xH2/?????ux????,6? õ?~?Ƌ?z?>>?$?-??1?????
f??/??????P??R???wY??y?
1072720 1072720 53 UE?j??V2? ? M?i?ڏvvC1b??yx??A?g?.?`>??^
?"?,q`B ?R???^? ?????"??eQ:Z?{?-? |?0Y?ł
n???{s??F??????i.?$q;??5??q?S?1??vw????&VB{??4??J??KJ???7j?h?$??s"???/K??
t:?lNv@?l e?
j?r??? ???qJ?????K?????Rtڶ??W??Zb??;?T??_?? D???u?H?5?a?)?ھ???9????]???Ҕ??%k?,???ܦe?n???9?$Ku?@ȪS?Lj?zQ?2??cf?6Kۤ?????C?y84)?? ;???u???N?C?l?@f??8??d[?)???ȬOp?5??}?zfF??Z? r??0?%0w?O ??{??$HR$??ś??6?ӳ?ŀ?˔<U?P2?,?uo?8DA????Zd(.?֔VF4!1?չ85[??݆?4.i???:???)2Mye??Mh??%?Ly=r??%?K2q?C??ҔLGI?D1??r?vY??w??ͯ??(????????ǘv?/?1ĵvJ%????ώ?.?ց??<)??h???Hk(t߂Om?{??&w:?BP???[???%zJ?'~lo?p�?G ?h??U???,4?d?fɅ^?+??_?RĎoIe?U??`6c ?(s??ҞoV?4@?`???[??$雙?K??1????R??D?[???Q&A?:]mT^??)???c?"blviK?Pg?f3?C???VsT?68r??????[Wc??zC???? ???c??H?AIJ???{nw???`?????W?Yɼ%?? ????]??P?????Qľ?c۾?9l?W?ƜܶQ?i???Ђ?l?X!? ?Ԟ?9???bNG"7??????w?ݚ??kX??JF?n@켊{?z???? ??{ ?p??;û?Y,}???o???.?7?/?Ӓl8d.Jt?n5~?h???N?԰,???`??P\??y????,????x?;??O??????2A???\?lwѿr??1=?2???S??ӳ???/F?
986130 986130 76 O_??C?s ???7]_???M?F
?u. }?3?9̄????? ??Mɻ??????m0?\R????"=$?9??=???
???????s??? ©??$3Wei???:?ڡ???JӤC??? ?t?=?r?y?>Mʹ???t gޱ?&???'GXO[6?kHF????"??l?
??t??-?ԉ?Ys???O;?O????>k??Yc%{K�??@ S? 1???c@??J?Y??'??!??D??C????V.]ptC??k??????`?k?/???0wIQnb�??H????r?>(???$??̨,?>? F`???FC???i%?Oڶ?t]???fR?"?[??a???n?s7?S?ǩ??O?l'?
MsNM#??(iA??jܓ?2??ɓ?9???l1??p??nw(??q|I??{?/?˹???Դ?@Ɛ?actb0?P?|?????D?E曍>?l|???,?KΚ?d?E????i???p?P??? ??lHEW??ȴ3??Ӗ?h?d𑣖چ??6g???z^?{??x
o??밃xw?w?h??V ??/???u/????????5??H???ڔ$??H?-ǀO???z?$??|bjn???hP??.N?%?2H?5Jh??0i+?] ???|G??aߜ??F?;?6????*?
?e?????=??'vXƢ?)r?U??????B?D?
?vAfGu?@????p?Xmq??2?H?y?k??0W#$?m??g
?{?D?^??;? ?,?t?????pR???!2{?M?s3?.??
# yR?O?K?bT???i????hY0?$?????cO??&qX/?^??^$?䎃?Z??-󜎹(?fEEn??f|?-!???3?޳G?!??G&??h
?-?N?G??}?UX>Z?o?5S????f???'f?6?d??P???H3?5>?t?ϐө???O?5>??޷?}??ء????????2?#/;?5??/???k-Yg???y??Qd?h?v8???x|60???|F2Q??;`d???/[|??Uh?Y??R?0l?Q׶???}???y8??я?4"`?y?^?????P?i:?tTpQ?ڒ0?]?? Z?;H???F?(ɀ[?r?v^????3}u??b?:???,l???;5ͷ7?TY?????H?xHIg??Mf+B(UK?E??؂U?G=??????????ҭ?8?}:47???cz???(?9Ɖ5??xr/di?"?41?BG?0u????.???+????#????? X??
??????w.?kԷ!^o>?Ku???`g.x#??43?y׾?ҩ35?hf???Kd G8?X<x"??I#?0??b?z????
????/??KPIbS?X?|?!?v??爎DG
J??Dw?_?8?l??w???͐??.z7???y?弽{??5?????4??=?W??Z!?>{?..󁠄?pO?!Xo???O??ŀ????⃶k????{/???vM$?Щ;5??t?????C???uQw?K۷?vs?x??"zס=??C~Fz?A?}?Vnґ߇???*|??`?l?iٻ?oZH??%
.>L?"â4%????,j?lbl?y????a]?L2b> ??$?$??ڂ?????նKι???$????q/?5????պ??'?Ɣz??$K??c:vL?a黴;????R?M??Q?S?9????풌?⋒`????"?K????y?Ta?:r?58?ۉ?????????>?)?I??ⅹ+4??L?+!E?/?@y???c^?Nl??-i?W???i% .?R???e?)$l?PT$'d???Iõ???4?y?zXm?|???C???g?ē8,?`3??h ڒs6?~ ?h??I6?00 79 Hm"T~?ƈ?2?Q4??9?.??w?+?"l??ૻ?׆/?????J?/=???-><מ?$b??w?0y ͰLPӲ?C? k@???Hl?%?\?Y?I?*E씸O?jύ????n?0(?|?2Rot???spi?N??q? Pm? dXʂ^Jn!?(X??lұ3??? b?^w?(?с?k1????f????v??ˠC $MG?y??u?_??S???S?䓶?_?i??=?N??7չ??Q{=??r6??_??? ?@?*?@??و?m?{?j?&`??1(5?o2???ݹ6??vqFM??O?é?#d>I?]?????~蹋~Z?f?$??#?ŷ8]??4,??W{M?ɰ???Ч ??? ?awn?m@?@? 5P?qNP9VdKثw??,X?憽??Z???z?R??Ҽ$㿊??ӣA8DOzO??7???{y3K???z??@g??@? ??????,D? @?e??2?? GEbA?
???8gO??`?>?(sb?=^1bv?z? ?` B##d?x??ĀH??zH??e(:?????a?@?@o???(??
??Ib?FQ ??w®#Zi0?7?,??R??N? ??wB?
?@m??????^DZ?}?s?%?٣H<ir?'?

I'm using PE-125bp with 4bp UMIs

@baraaorabi
Copy link
Collaborator

Oh, you need to add -g to use gzipped input (it's on the --help but not on the README!)

@baraaorabi
Copy link
Collaborator

Added to README now.

Also, about multi-threading, Calib runtime does not scale well with more threads. If you multiple samples run them in parallel but each on a single thread. Also, if you want a bit of speedup, run with --no-sort option (Calib cons module doesn't need the cluster file to be sorted).

@ChadFibke
Copy link
Author
ChadFibke commented Apr 25, 2020

That seems to be working, thank you :)

I ran it quickly on my negative control and I get the expected cluster tsv file (the larger data sets are still waiting in a queue)! Thanks for the additional comments on Calib's scalability. I'll close the issue and will reopen in something goes haywire over the weekend!

@ChadFibke ChadFibke reopened this Apr 25, 2020
@ChadFibke
Copy link
Author

Alas, I've hit another snag.

I was able to generate a proper cluster file with the additional -g flag, which results in:

head test.cluster

1654325 3012539 9 @HS27_336:2:1101:2628:2159 NACTGGGCCCAGCTTGCTAGACAAATAGGAGCCAGCCTGAATGATGACATTCTTTTCGGGGTGTTCGCACAAAGCAAGCCAGATTCTGCCGAACCAATGGATCGATCTGCCATGTGTGCATTCCC #<:??GDGGGGGGFGGGCEGFGCEGGGGGGGGGGGGGGGGGGGGGGFGGGGGGEGGGGGDGGGGGGGGGGDGFEFGGGGGGGGGGGG>GECGGGGGGF8FGGGGGGGGGGGGGDDDD<D=GGGGG @HS27_336:2:1101:2628:2159 CGAGTCATTGTTTTTGTTGACGATCTTGTTGAAGAAGTCGTTGACATATTTGATAGGGAATGCACACATGGCAGATCGATCCATTGGTTCGGCAGAATCTGGCTTGCTTTGTGCGAACACCCCGA A?ABBGGGGGGDGGGGGGBGGGGGGGGGGGBEGEGGGGGBFGGGGGGGGGGG1FGG>FGGDGGGGGGGGGGGGGGFDFFDFGGDGBFG>FGCD>GGE>F@GGGGGGGGGGGGGG<FGGGGBGGGG
1073675 1462219 87 @HS27_336:2:1101:10913:2207 AGACTCGCCCGGCTAATTTTTGTATTCTTAGTAGAGATGGGGTTTCACCATGTTGGCCAGGCTGGTCTCGAACTCTTGACCTCATGATATGCCCGCCTCGGCCTCCCAAGTGCGGGGGTTACAGC 3<<<AFE<//;EBBGEGGGEGGG>FG>C<11=CF1E1C11:9//<:<:FGGFFD10=1EFGB@0FB11:1=FGG>@GCC0B@GGGGD00=/;E>CC.C8?@0FBB@08;9C..6CDGG/9@ @HS27_336:2:1101:10913:2207 CTACTGTAGGAACAGGAAGAAGCTGCCAACAGCCCACAGGCCCAGCACAGAGGAAGGGAAGCTAACTAACCCTGGAACGTCGGGACAATGGGAGTAGCTAGGGCCTGAGTGGATTCGAAAATAAA ?BBBB1>FFDDF1;F0>=FDGGGCGCGDGGE1F>FGGG0BDCBBFGG1FG11B11:FG0C>FGGEDGGGC0:=E::>BFD@CGGE<FGG0E@G=GGGCFFD@GE..FGGFGGDDGGGGGB>BGGE
610675 5225094 89 @HS27_336:2:1101:10830:2217 CCTGTTGCTCAGCACCCGGGCTGGGGGGCTCGGCCTGAACCTCCAGTCGGCAGACACTGTGATCATTTTTGACAGCGACTGGAATCCTCACCAGGTAAAAGCGGGCCGGGCCCCAGGTCGAGGAG :>>@0b;FFFGBFGGGGGGDGGFGGDGGGGGGGGGG/DBGGGGGGGGE>CGGGGGEE=GGGGGGE=DGGGGGG0CECGGACGGGDGEGEGGGGGD@GG/@ce?.CAG;A;;>ACC>B/CACADCC @HS27_336:2:1101:10830:2217 GCTCTGGACCCTCAAGCCAGGGCCGTCTCCTCGAGGTTTTGCAGGCACCCCCTTCCTTCTCCTCGACCTGGGGCCCGGCCCGCTTTTACCTGGTGAGGATTCCAGTCGCTGTCAAAAATGATAAC 3A=BBGGEEBGGGGGGGGG0FFGGG/EEGEGBF>FDFGEGE>=:FCFEGGGGG>EFF1EF@F1FG<EEG00@>/C9CBCADDG<.CF<8@@dggg=EG//8/68EGG6CGDD@/68/6@/8D/CG
922300 1441903 105 @HS27_336:2:1101:12402:2163 NTACTCGGGGCCAATAGCCCCCCACCTGACACCCCCAACCCTACATTTCTGCACAAAAGCCCCGCCTCCCTGGGGCTGCGGGCAGAGGGATGAGGCTCCCACCTTTCAGCAGGTCAGAGAGCGGG #<?@BFGBGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGEGGGGGGGGGGGGGGGGBGGDBGGGGGGGGG<D<GGGGGGGG.@g@EGGGGGGGGE@@DGGGGGGGCGGGEGGDEGGG @HS27_336:2:1101:12402:2163 ATTCTCTTGACTGACCACGCCTTTCTTCCCTCCCCTCGAAATGAAGCTACAACATCACCACGGGTCTGTACCCCTTCGAAGGGGACAACATCTACAAGTTGTTTGAGAACATCGGGAAGGGGAGC BBBBCGGGGGGGGGGFGGGGGFGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG@GGGGGGGGGGGGG0FGGFGEGBGGGGGGEGGGGGGGGGGGGGGGCFGGGGGCGDGGGCCA
708225 3408142 115 @HS27_336:2:1101:13262:2189 GAGTTTCTCACTGATATCGAATGCAATGGATGATCTGGGAAATAAGAAGAATTTATGGTATTGCCTACAAAGAAGTTGATGAACCGGTCCTTTACAGATGAAAGGACTTTGGCTCCCAGGGCGCT :@BBBGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGECGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEEFGGGBBGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGF @HS27_336:2:1101:13262:2189 GGTGTTAGAAGAGCCCAGCCAGTGTCCTGACTGTGTGGTGAGCGCCCTGGGAGCCAAAGTCCTTTCATCTGTAAAGGACCGGTTCATCAACTTCTTTGTAGGCAATACCATAAATTCTTCTTATT ABABAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
1672350 2582257 144 @HS27_336:2:1101:16400:2184 NCAGTGTGTGGAGGAATTACATTCACCTCTTCATCAAGGTTACTTTTTCGTGGTGTTCTCTGTGTTTCAAAACTAAATAACAATAAGTGAAGTCATTCACATACTGAAAATTTACAATTTGTGCT #<<>AGGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGCGGGGGGGGGGGGGGGGGGGGG>EGGGGGGGGG>FGGGGGGGGGGGGGFGGGGGEGGGGGGEGGGG @HS27_336:2:1101:16400:2184 CTCTATGATTTTATGAGACAACAGAAGCATTATACTGCTTTTTTGATGCATAAAGCACAAATTGTAAATTTTCAGTATGTGAATGACTTCACTTATTGTTATTTAGTTTTGAAACACAGAGAACA CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGDEFCCGGGGCFGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGGGGG<EGGGGGGFGGGGG
446900 2812708 152 @HS27_336:2:1101:16734:2170 NCGTCTCAGAGATAACCAATACATTACCACATCTGACTTGGTGGTAAACTTTTGAGTTTGCAGACTTTCCAAAGCCATCCACTTCACTGGCAGCTTTGCACCTGTTTTGTTGTGTACACTATAGT #=<@BGEGGGGGGGGGGGGGGGGGGGGFGGGGG@FCDGGGGGGGGGGGGGGGGFFGGGEFGGFGGGGGGGGEGGGGGGGGGGGGGGGGGFGGCGGGGGGGGGGGFCG>;FGEGGGFDGGE0DFGG @HS27_336:2:1101:16734:2170 ATGTTATTTCAGCCACGGGTAATAATTTTTGTCCTTTCTGTAGGCTGGATGAAAAATTCACAGTCAAGGTTGCTGATTTTGGTCTTGCCAGAGACATGTATGATAAAGAATACTATAGTGTACAC CCCCCGGGGGGGGGGGGGGDGGGGGFGGGGGFGGGGEGGGGGGGGGGGEGGGGGGFGGGGGFGGGGCCFGGGDFG>FGGFGGFGGGGGCGFGGGGGGGGGGGGGGGGGGGGGGFGGGGFGGGGGG
545275 2847558 155 @HS27_336:2:1101:17100:2147 NCAGTCATTTTTGTTGGTGTTGGCAGACCTTCTGAAATTTTATATGGACTCTTCAGGGGTGAAATATAGATGTTCCCTCCAGGAATCCGTAAGGGTGAACTAGGAAACTTGTAAGGGCTTCGAGG #<<@AFFGGGGGGGG1FEEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGCFGGGGGGGGGGGEGGGGFGGGGGGFFG @HS27_336:2:1101:17100:2147 TGACTTAGCCCCCTACCTTGTCACCAATACCTCACATTCCTCGAAGCCCTTACAAGTTTCCTAGTTCACCCTTACGGATTCCTGGAGGGAACATCTATATTTCACCCCTGAAGAGTCCATATAAA CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG>DGGGGGGGGGGGFGGGGG
480225 4411453 210 @HS27_336:2:1101:2484:2389 AGACTTGGGCCTGGCCACATGCCCAGCAAGAGTCCCCATCCTAGCCCCTTGTGGACATAGGGGTTTGCTCCGGAGAGACCTGCAAAGAGCCCAGGTGCATACCTTGGCAATCTGCATACACCAGT 33GCFFBGDBGGGGGGGDGG>GGGGGGGGDCGGGGGGGGGEF@>GE@G19FF1FGGEFGGCG>1FGEGFDGGGGGGGGFGG8DG>>FFEBFFGGG>FF@=F8FF=FEGGEGGEG@F6EBG@ @HS27_336:2:1101:2484:2389 TAAGTGCCTTCTGGGCATCTGCCTGACATCCACGGTGCAGCTGGTGACACAGCTTATGCCCTATGGCTGCCTCTTAGACCATGTCCGGGAAAACCGCGGACGCCTGGGCTCCCAGGACCTGCTGA A3FCDBDGGG>FFGGEGGGGG>@GGEGGEGGGE11FG@F>FG>FGGGGGBGC1DG>11:FGCGGGGBGG@8FGG<EGGGGGGG;E<6DDEGGGG/CEGGGBG=EC
1195950 2824450 347 @HS27_336:2:1101:10867:2304 CACTGGCCCAGGTCTCACCAGGCCGCTACCCGGGCCACACACCACCCCTCTGCTGGTCACACCAGGCTGAGCCAGTGACCGCTGCTGCCTGGCCATGGCCTGACAACTCGTGCTATTTTTCCTCA 3>3:0>F00CCDFDGEGCGGGGFGCGA/CFFGGGGGBDGBBGGGGDGGEGGGGGEGGGGGCGG>BFBB@GGEGGG@FDG0CG>DCEG=FGGGGGGGGEGGGGGGB/.CDGGGB.8C@=@GGGBDD @HS27_336:2:1101:10867:2304 CCAGTATCTTTCCTAGGCTTCCCAAGGGCACTGCCTGCCCCATGGTGCACCTGGGATCCCTGGGAGCCCCGCCTCATCCCCGGGACTGGGCACCTGGCTCCTCTTCACGTAGGAATCCTCTTCAT B3A0AGGGFEF>1E1BDGEGGG>GD>0CFGGG@FGDFGGFGG1111?DF>CGGEG:1::<BF@DEGGFFDGEEGGDFCGGGAGF.<CD@GFFGGGB=0FGGGGGC/6C8EDB>=DG=GGGGGEDB

Then I was trying to pass the cluster file to calib_cons with the following command:

calib_cons -q 1.fastq 2.fastq -o 1.out 2.out -c test.cluster --min-reads-per-cluster 1

Reading cluster file: test.cluster
Reading fastq file: 1.fastq
Writing output files: 1.out
[spoa::Graph::add_alignment] error: empty sequence!

I'm then left with the following files in the working DIR:

1.out.fastq
1.out.fastq0
1.out.fastq1
1.out.msa1
test.cluster

All files are empty except the test.cluster file :/. I have tried the calib_cons command with the various different examples you've provided in the --help section, which result in the same message. Am I making another silly mistake?

@baraaorabi
Copy link
Collaborator

How did you generate the 1.fastq and 2.fastq files? They are supposed to be ungzipped versions of test_R1.fastq.gz and test_R1.fastq.gz. Can you also head them?

@ChadFibke
Copy link
Author
ChadFibke commented Apr 26, 2020

I definitely miss-interpreted the space separated FASTQ list comment. I thought since the sequences from the fastq.R1 and fastq.R2 were already present in the cluster file the original fastq files were no longer needed. I used the name of the unzipped test_R1.fastq.gz and test_R2.fastq.gz and calib_cons successfully finished. Thanks for all the help!

@sandmanns
Copy link

Hi @baraaorabi,
I was wondering if you maybe had the time to give the error correction module another try with conda?
I am using Calib and the first analysis step works really well. However, I am struggling with the installation of the error correction module for 3 days now. While I could install Calib without any problem (I tested both, conda and git), the error correction always fails. It seems openssl/md5.h cannot be found (I am working on Linux). I tried everything I could possibly think of. It should be there, but still it can't be found.
So, it would be really great if there was a conda-version for the error correction module :-)

Best, Sarah

@baraaorabi
Copy link
Collaborator

Hi @sandmanns!

Please give the latest conda release a try. It should now include the error correction module. Let me know if it works!

@sandmanns
Copy link

Perfect! It works.
Thanks a lot!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants