[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create long-term archival format for variant data #503

Open
julesjacobsen opened this issue Jul 18, 2023 · 0 comments
Open

Create long-term archival format for variant data #503

julesjacobsen opened this issue Jul 18, 2023 · 0 comments

Comments

@julesjacobsen
Copy link
Contributor

The MVStore file format is not guaranteed to remain stable from one minor release to the next i.e. 2.1.x -> 2.2.x changes the format version from 2 -> 3 rendering data unreadable if the H2 version is updated.

For the H2 database there is an Upgrade utility which will export the data and import it to the new version, however for MVStore files there is no supported migration path (h2database/h2database#3834 (comment)).

Consequently, we'll need to slightly re-think our new (v14+) variant database build strategy which previously merged a bunch of pre-parsed MVStore files for gnomAD, UK10K, ESP, ALFA and dbNSFP to create the final variants.mv.db release file. Instead it might be better to store them as gzip compressed protobuf which will be a lot quicker and easier to import into a new MVStore than the original files (especially gnomAD v3) and will also handle schema evolution.

Why not just use re-index the original VCF or create a new VCF? Well, it's yet another transformation to go through, it takes a lot longer to parse the info from the file and the file sizes are a lot larger than the protobuf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant