[go: nahoru, domu]

Skip to content

Commit

Permalink
updated README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
LindoNkambule committed Oct 24, 2019
1 parent cbdbc48 commit 6473907
Showing 1 changed file with 36 additions and 2 deletions.
38 changes: 36 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,47 @@
# VCFCompare
A Python program for evaluating site-level concordance of a query VCF against a truth VCF.

This tool compares two variant callsets against each other and produces a CSV file with summary metrics. The summary metric CSV file contains:
The summary metric CSV file contains:
* Variant type: SNV or INDEL
* Total number of variants in the truth and query VCF files
* Total true-positive, false-positive, and false-negative calls
* Recall and Precision

## Usage
This tool compares two variant callsets against each other and produces a CSV file with summary metrics.

In the following examples, we assume that the code has been installed to the directory `${VCFCompare}`.

```bash
$ python3 ${VCFCompare}/src/python/VCFCompare.py \
-t example/gatk_variants.vcf \
-q bcftools_variants.vcf \
-o output
$ ls test.*
test.csv
```
python3 VCFCompare.py --truth truth.vcf --query query.vcf --output output

The example above compares an example run of GATK 4.1.0.0 against an example run of bcftools 1.9 on the same random sample.

The summary metric CSV file contains:

Type | TRUTH.TOTAL | TP | FP | FN | QUERY.TOTAL | Recall | Precision
--- | --- | --- | --- | --- | --- | --- | ---
SNV | 3610 | 3573 | 538 | 37 | 4111 | 0.989750693 | 0.869131598
INDEL | 205 | 104 | 101 | 101 | 247 | 0.507317073 | 0.421052632

##Upset Plots
If you want to visualize the difference and intersection between the truth and query VCF files, you can use the upset.R script under src/R/
The upset.R script runs on a set of VCFCompare.py results produced with the -o flag, as shown in the example above

```bash
$ Rscript ${VCFCompare}/src/R/upset.R \
-i test.csv \
-o TestRun
$ ls TestRun.*
TestRun.INDEL.pdf TestRun.SNV.pdf
```
This will produce two PDF files: one for SNVs and one for INDELs.
Below is a screenshot for SNVs

![](doc/TestRun.SNV.png)

0 comments on commit 6473907

Please sign in to comment.