[go: nahoru, domu]

Skip to content

LindoNkambule/VCFCompare

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VCFCompare

A Python program for evaluating site-level concordance of a query VCF against a truth VCF.

The summary metric CSV file contains:

  • Variant type: SNV or INDEL
  • Total number of variants in the truth and query VCF files
  • Total true-positive, false-positive, and false-negative calls
  • Recall and Precision

Usage

This tool compares two variant callsets against each other and produces a CSV file with summary metrics.

In the following examples, we assume that the code has been installed to the directory ${VCFCompare}.

$ python3 ${VCFCompare}/src/python/VCFCompare.py \
      -t example/gatk_variants.vcf \
      -q bcftools_variants.vcf \
      -o output
$ ls test.*
test.csv

The example above compares an example run of GATK 4.1.0.0 against an example run of bcftools 1.9 on the same random sample.

The summary metric CSV file contains:

Type TRUTH.TOTAL TP FP FN QUERY.TOTAL Recall Precision
SNV 3610 3573 538 37 4111 0.989750693 0.869131598
INDEL 205 104 101 101 247 0.507317073 0.421052632

##Upset Plots If you want to visualize the difference and intersection between the truth and query VCF files, you can use the upset.R script under src/R/ The upset.R script runs on a set of VCFCompare.py results produced with the -o flag, as shown in the example above

$ Rscript ${VCFCompare}/src/R/upset.R \
      -i test.csv \
      -o TestRun
$ ls TestRun.*
TestRun.INDEL.pdf  TestRun.SNV.pdf

This will produce two PDF files: one for SNVs and one for INDELs. Below is a screenshot for SNVs

About

VCFCompare for concordance between two callsets

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published