-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
cbdbc48
commit 6473907
Showing
1 changed file
with
36 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,47 @@ | ||
# VCFCompare | ||
A Python program for evaluating site-level concordance of a query VCF against a truth VCF. | ||
|
||
This tool compares two variant callsets against each other and produces a CSV file with summary metrics. The summary metric CSV file contains: | ||
The summary metric CSV file contains: | ||
* Variant type: SNV or INDEL | ||
* Total number of variants in the truth and query VCF files | ||
* Total true-positive, false-positive, and false-negative calls | ||
* Recall and Precision | ||
|
||
## Usage | ||
This tool compares two variant callsets against each other and produces a CSV file with summary metrics. | ||
|
||
In the following examples, we assume that the code has been installed to the directory `${VCFCompare}`. | ||
|
||
```bash | ||
$ python3 ${VCFCompare}/src/python/VCFCompare.py \ | ||
-t example/gatk_variants.vcf \ | ||
-q bcftools_variants.vcf \ | ||
-o output | ||
$ ls test.* | ||
test.csv | ||
``` | ||
python3 VCFCompare.py --truth truth.vcf --query query.vcf --output output | ||
|
||
The example above compares an example run of GATK 4.1.0.0 against an example run of bcftools 1.9 on the same random sample. | ||
|
||
The summary metric CSV file contains: | ||
|
||
Type | TRUTH.TOTAL | TP | FP | FN | QUERY.TOTAL | Recall | Precision | ||
--- | --- | --- | --- | --- | --- | --- | --- | ||
SNV | 3610 | 3573 | 538 | 37 | 4111 | 0.989750693 | 0.869131598 | ||
INDEL | 205 | 104 | 101 | 101 | 247 | 0.507317073 | 0.421052632 | ||
|
||
##Upset Plots | ||
If you want to visualize the difference and intersection between the truth and query VCF files, you can use the upset.R script under src/R/ | ||
The upset.R script runs on a set of VCFCompare.py results produced with the -o flag, as shown in the example above | ||
|
||
```bash | ||
$ Rscript ${VCFCompare}/src/R/upset.R \ | ||
-i test.csv \ | ||
-o TestRun | ||
$ ls TestRun.* | ||
TestRun.INDEL.pdf TestRun.SNV.pdf | ||
``` | ||
This will produce two PDF files: one for SNVs and one for INDELs. | ||
Below is a screenshot for SNVs | ||
|
||
![](doc/TestRun.SNV.png) |