[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INFO][Launcher]bim2vcf Exited with failure (-1) #144

Open
vicruiser opened this issue Jan 15, 2020 · 8 comments
Open

[INFO][Launcher]bim2vcf Exited with failure (-1) #144

vicruiser opened this issue Jan 15, 2020 · 8 comments

Comments

@vicruiser
Copy link
vicruiser commented Jan 15, 2020

Subject of the issue

bim2vcf experiments severe error.

Your environment

  • version of jvarkit: 98f4944
  • version of java:
    openjdk 11 2018-09-25
    OpenJDK Runtime Environment 18.9 (build 11+28)
    OpenJDK 64-Bit Server VM 18.9 (build 11+28, mixed mode)
  • the value of ${JAVA_HOME}:
    /bim2vcf/jdk-11
  • which OS: Ubuntu 18.04

Steps to reproduce

java -jar jvarkit/dist/bim2vcf.jar -R Homo_sapiens.GRCh38.dna.primary_assembly.fa input.bim

input.bim looks like:

1	1:69081:G:C	0	69081	C	G
1	1:69149:T:A	0	69149	A	T
1	1:69224:A:T	0	69224	T	A
1	1:69270:A:G	0	69270	G	A
1	1:69281:T:C	0	69281	C	T
1	1:69334:C:T	0	69334	T	C
...
1	1:69610:C:T	0	69610	T	C
1	1:69620:D:1	0	69620	T	TA << here the problem appears
1	1:69634:T:A	0	69634	A	T
1	1:69640:G:C	0	69640	C	G
1	1:69655:G:C	0	69655	C	G
1	1:69697:T:C	0	69697	C	T
1	1:69702:D:1	0	69702	T	TG
1	1:69726:D:3	0	69726	T	TATC

It seems that bim2vcf has problems dealing with alleles that are numbers.

Expected behaviour

To run fine.

Actual behaviour

##fileformat=VCFv4.2
##INFO=<ID=MORGAN,Number=1,Type=Float,Description="Centimorgan">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Variation type">
##contig=<ID=1,length=248956422>
....
##contig=<ID=KI270392.1,length=971>
##contig=<ID=KI270394.1,length=970>
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
1	69081	1:69081:G:C	G	C	.	.	MORGAN=0.0;SVTYPE=SNV
1	69149	1:69149:T:A	T	A	.	.	MORGAN=0.0;SVTYPE=SNV
1	69224	1:69224:A:T	A	T	.	.	MORGAN=0.0;SVTYPE=SNV
1	69270	1:69270:A:G	A	G	.	.	MORGAN=0.0;SVTYPE=SNV
1	69281	1:69281:T:C	T	C	.	.	MORGAN=0.0;SVTYPE=SNV
1	69334	1:69334:C:T	C	T	.	.	MORGAN=0.0;SVTYPE=SNV
1	69351	1:69351:C:G	C	G	.	.	MORGAN=0.0;SVTYPE=SNV
1	69360	1:69360:C:T	C	T	.	.	MORGAN=0.0;SVTYPE=SNV
1	69366	1:69366:T:G	T	G	.	.	MORGAN=0.0;SVTYPE=SNV
1	69369	1:69369:G:A	G	A	.	.	MORGAN=0.0;SVTYPE=SNV
1	69428	1:69428:T:G	T	G	.	.	MORGAN=0.0;SVTYPE=SNV
1	69438	1:69438:T:C	T	C	.	.	MORGAN=0.0;SVTYPE=SNV
1	69462	1:69462:C:T	C	T	.	.	MORGAN=0.0;SVTYPE=SNV
1	69486	1:69486:C:T	C	T	.	.	MORGAN=0.0;SVTYPE=SNV
1	69489	1:69489:A:C	A	C	.	.	MORGAN=0.0;SVTYPE=SNV
1	69493	1:69493:G:A	G	A	.	.	MORGAN=0.0;SVTYPE=SNV
1	69495	1:69495:C:T	C	T	.	.	MORGAN=0.0;SVTYPE=SNV
1	69496	1:69496:G:A	G	A	.	.	MORGAN=0.0;SVTYPE=SNV
1	69499	1:69499:A:G	A	G	.	.	MORGAN=0.0;SVTYPE=SNV
1	69511	1:69511:A:G	A	G	.	.	MORGAN=0.0;SVTYPE=SNV
1	69511	1:69511:A:T	A	T	.	.	MORGAN=0.0;SVTYPE=SNV
1	69521	1:69521:T:C	T	C	.	.	MORGAN=0.0;SVTYPE=SNV
1	69534	1:69534:T:C	T	C	.	.	MORGAN=0.0;SVTYPE=SNV
1	69536	1:69536:C:T	C	T	.	.	MORGAN=0.0;SVTYPE=SNV
1	69552	1:69552:G:C	G	C	.	.	MORGAN=0.0;SVTYPE=SNV
1	69555	1:69555:T:G	T	G	.	.	MORGAN=0.0;SVTYPE=SNV
1	69559	1:69559:G:A	G	A	.	.	MORGAN=0.0;SVTYPE=SNV
1	69561	1:69561:G:T	G	T	.	.	MORGAN=0.0;SVTYPE=SNV
1	69584	1:69584:A:G	A	G	.	.	MORGAN=0.0;SVTYPE=SNV
1	69594	1:69594:T:C	T	C	.	.	MORGAN=0.0;SVTYPE=SNV
1	69603	1:69603:T:C	T	C	.	.	MORGAN=0.0;SVTYPE=SNV
1	69604	1:69604:T:C	T	C	.	.	MORGAN=0.0;SVTYPE=SNV
1	69610	1:69610:C:T	C	T	.	.	MORGAN=0.0;SVTYPE=SNV
[SEVERE][BimToVcf]not handled: 1	1:69620:D:1	0	69620	T	TA
[INFO][Launcher]bim2vcf Exited with failure (-1)

@lindenb
Copy link
Owner
lindenb commented Jan 15, 2020

I wrote this for a colleagues years ago. As far as I understand my program only handled bases with one character only. Give me a few minutes, I'll update the code for multiple nucleotides.

@lindenb
Copy link
Owner
lindenb commented Jan 15, 2020

@vicruiser i pushed c81f95b , please tell me if that works.

Please not for this kind of deletion T/TA I didn't look at the POS in the VCF. But in the VCF, the position should be shifted https://samtools.github.io/hts-specs/VCFv4.2.pdf

Strings must include the base before the event (which must be reflected in the POS field),

so tell me if you think that the POS is wrong for an INS or a DEL.

@vicruiser
Copy link
Author

It works now! Thanks. I'm going to check carefully the matter about DEL and INS and I'll get back to you.

@vicruiser
Copy link
Author

Hi again,

I think something is wrong with the generated .vcf but not entirely. This is part of the output.vcf for an insertion:

input.bim
10    10:80432024:I:6    0    80432024    CCTACAG    C
output.vcf
10    80432024    10:80432024:I:6    C    CCTACAG    .    .    MORGAN=0.0;SVTYPE=DEL

and for a deletion:

input.bim
10    10:80509330:D:12    0    80509330    C    CCTGGAGCTGGCT
output.vcf
10    80509330    10:80509330:D:12    C    CCTGGAGCTGGCT    .    .MORGAN=0.0;SVTYPE=INS

I can see labels that are supposed to be deletions (e.g.: 1:69726:D:3) categorized as insertion (;SVTYPE=INS) and the other way around.

The reference allele column in my input.bim is the last one. I don't know if that is the problem or is just a problem with labeling.

Additionally, I've run VEP to annotate the .vcf generated with bim2vcf and indeed, something goes wrong with the positions but only for deletions.

Output of vep for insertion (well categorized as insertion):

Uploaded_variation	Location	Allele	Gene	Feature	Feature_type	Consequence	cDNA_position	CDS_position	Protein_position	Amino_acids	Codons	Existing_variation	Extra
10:80432024:I:6    10:80432024-80432025    CTACAG    ENSG00000122378    ENST00000372181    Transcript    inframe_insertion    1085-1086    615-616    205-206    -/LQ    -/CTACAG    -    IMPACT=MODERATE;STRAND=1;SOURCE=Homo_sapiens.GRCh38.98.sorted.gtf.gz
10:80432024:I:6    10:80432024-80432025    CTACAG    ENSG00000122378    ENST00000372185    Transcript    inframe_insertion    750-751    582-583    194-195    -/LQ    -/CTACAG    
...

Output of vep for deletion (wrongly categorized as insertion):

Uploaded_variation	Location	Allele	Gene	Feature	Feature_type	Consequence	cDNA_position	CDS_position	Protein_position	Amino_acids	Codons	Existing_variation	Extra
10:80509330:D:12    10:80509330-80509331    CTGGAGCTGGCT    ENSG00000108219    ENST00000341863    Transcript    intron_variant    -    -    --    -    -    IMPACT=MODIFIER;STRAND=1;SOURCE=Homo_sapiens.GRCh38.98.sorted.gtf.gz
10:80509330:D:12    10:80509330-80509331    CTGGAGCTGGCT    ENSG00000108219    ENST00000372156    Transcript    inframe_insertion    811-812    309-310    103-104    -/LELA    -/CTGGAGCTGGCT    -    IMPACT=MODERATE;STRAND=1;SOURCE=Homo_sapiens.GRCh38.98.sorted.gtf.gz
...

So basically, yes, if i'm correct, DEL positions should be shifted. Thanks again.

@lindenb
Copy link
Owner
lindenb commented Jan 15, 2020

@vicruiser Hi again,
Thanks for the report, I'll check this tomorrow (18H43 here)

@lindenb
Copy link
Owner
lindenb commented Jan 16, 2020

handling DEL or INS is quite difficult for me now. There was a bug because I just use one character to get the reference allele. As a quick fix, The DEL or INS are just skipped without error now.

@vicruiser
Copy link
Author

Ok, I'll wait for the final update then. Thanks for your time.

@vicruiser
Copy link
Author

In case someone need it:
Since my knowledge of java is zero, using this version c81f95b, I fixed the the positions of the deletions after the .vcf file was generated with the following command
awk '/D:/{ temp=$4; $4=$5 ; $5=temp } 1' OFS='\t' input.vcf > input_fixed.vcf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants