-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit e615765
Showing
73 changed files
with
9,669 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# How to Contribute | ||
|
||
We are not accepting external pull requests at this time. | ||
|
||
## Contributor License Agreement | ||
|
||
Contributions to this project must be accompanied by a Contributor License | ||
Agreement (CLA). You (or your employer) retain the copyright to your | ||
contribution; this simply gives us permission to use and redistribute your | ||
contributions as part of the project. Head over to | ||
<https://cla.developers.google.com/> to see your current agreements on file or | ||
to sign a new one. | ||
|
||
You generally only need to submit a CLA once, so if you've already submitted one | ||
(even if it was for a different project), you probably don't need to do it | ||
again. | ||
|
||
## Code Reviews | ||
|
||
We are not accepting external pull requests at this time. | ||
|
||
## Community Guidelines | ||
|
||
This project follows | ||
[Google's Open Source Community Guidelines](https://opensource.google/conduct/). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
# Build with: | ||
# sudo docker build -t deepconsensus . | ||
# For GPU: | ||
# sudo docker build --build-arg build_gpu=true --build-arg FROM_IMAGE=nvidia/cuda:11.3.0-cudnn8-runtime -t deepconsensus_gpu . | ||
|
||
|
||
|
||
ARG FROM_IMAGE=continuumio/miniconda3 | ||
|
||
FROM continuumio/miniconda3 as conda_setup | ||
RUN conda config --add channels defaults && \ | ||
conda config --add channels bioconda && \ | ||
conda config --add channels conda-forge | ||
RUN conda create -n bio \ | ||
python=3.8 \ | ||
pbcore \ | ||
pbbam \ | ||
pbccs \ | ||
pbmm2 \ | ||
parallel \ | ||
jq \ | ||
gcc \ | ||
pycocotools \ | ||
bioconda::seqtk \ | ||
bioconda::unimap \ | ||
bioconda::bedtools \ | ||
bioconda::minimap2 \ | ||
bioconda::extracthifi \ | ||
bioconda::zmwfilter \ | ||
bioconda::pysam \ | ||
bioconda::samtools=1.10 \ | ||
bioconda::pyfastx=0.8.4 \ | ||
&& conda clean -a | ||
RUN wget https://github.com/PacificBiosciences/align-clr-to-ccs/releases/download/0.2.0/actc && \ | ||
chmod +x actc && \ | ||
mv actc /opt/conda/bin/ | ||
|
||
FROM ${FROM_IMAGE} as builder | ||
COPY --from=conda_setup /opt/conda /opt/conda | ||
|
||
ENV PATH=/opt/conda/envs/bio/bin:/opt/conda/bin:"${PATH}" | ||
ENV LD_LIBRARY_PATH=/opt/conda/envs/bio/lib:/opt/mytools/lib/x86_64-linux-gnu:"${LD_LIBRARY_PATH}" | ||
|
||
COPY . /opt/deepconsensus | ||
WORKDIR /opt/deepconsensus | ||
ARG build_gpu | ||
RUN if [ "${_TAG_NAME}" = "*gpu" ] || [ "${build_gpu}" = "true" ]; then \ | ||
echo "Installing deepconsensus[gpu] version"; \ | ||
pip install .[gpu]; \ | ||
else \ | ||
echo "Installing deepconsensus[cpu] version"; \ | ||
pip install .[cpu]; \ | ||
fi | ||
|
||
CMD ["deepconsensus"] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
Copyright (c) 2021, Google Inc. | ||
All rights reserved. | ||
|
||
Redistribution and use in source and binary forms, with or without modification, | ||
are permitted provided that the following conditions are met: | ||
|
||
1. Redistributions of source code must retain the above copyright notice, this | ||
list of conditions and the following disclaimer. | ||
|
||
2. Redistributions in binary form must reproduce the above copyright notice, | ||
this list of conditions and the following disclaimer in the documentation | ||
and/or other materials provided with the distribution. | ||
|
||
3. Neither the name of Google Inc. nor the names of its contributors | ||
may be used to endorse or promote products derived from this software without | ||
specific prior written permission. | ||
|
||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND | ||
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | ||
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | ||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR | ||
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES | ||
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; | ||
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON | ||
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | ||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
# DeepConsensus | ||
|
||
DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific | ||
Biosciences (PacBio) Circular Consensus Sequencing (CCS) data. | ||
|
||
![DeepConsensus overview diagram](https://raw.githubusercontent.com/google/deepconsensus/main/docs/images/pipeline_figure.png) | ||
|
||
## Installation | ||
|
||
### From pip package | ||
|
||
If you're on a GPU machine: | ||
|
||
```bash | ||
pip install deepconsensus[gpu]==0.2.0 | ||
# To make sure the `deepconsensus` CLI works, set the PATH: | ||
export PATH="/home/${USER}/.local/bin:${PATH}" | ||
``` | ||
|
||
If you're on a CPU machine: | ||
|
||
```bash | ||
pip install deepconsensus[cpu]==0.2.0 | ||
# To make sure the `deepconsensus` CLI works, set the PATH: | ||
export PATH="/home/${USER}/.local/bin:${PATH}" | ||
``` | ||
|
||
### From Docker image | ||
|
||
For GPU: | ||
|
||
```bash | ||
sudo docker pull google/deepconsensus:0.2.0-gpu | ||
``` | ||
|
||
For CPU: | ||
|
||
```bash | ||
sudo docker pull google/deepconsensus:0.2.0 | ||
``` | ||
|
||
### From source | ||
|
||
```bash | ||
git clone https://github.com/google/deepconsensus.git | ||
cd deepconsensus | ||
source install.sh | ||
``` | ||
|
||
If you have GPU, run `source install-gpu.sh` instead. Currently the only | ||
difference is that the GPU version installs `tensorflow-gpu` instead of | ||
`intel-tensorflow`. | ||
|
||
(Optional) After `source install.sh`, if you want to run all unit tests, you can | ||
do: | ||
|
||
```bash | ||
./run_all_tests.sh | ||
``` | ||
|
||
## Usage | ||
|
||
See the [quick start](https://github.com/google/deepconsensus/blob/main/docs/quick_start.md). | ||
|
||
## Where does DeepConsensus fit into my pipeline? | ||
|
||
After a PacBio sequencing run, DeepConsensus is meant to be run on the subreads | ||
to create new corrected reads in FASTQ format that can take the place of the CCS | ||
reads for downstream analyses. | ||
|
||
See the [quick start](https://github.com/google/deepconsensus/blob/main/docs/quick_start.md) | ||
for an example of inputs and outputs. | ||
|
||
## How to cite | ||
|
||
If you are using DeepConsensus in your work, please cite: | ||
|
||
[DeepConsensus: Gap-Aware Sequence Transformers for Sequence Correction](https://www.biorxiv.org/content/10.1101/2021.08.31.458403v1) | ||
|
||
## Disclaimer | ||
|
||
This is not an official Google product. | ||
|
||
NOTE: the content of this research code repository (i) is not intended to be a | ||
medical device; and (ii) is not intended for clinical use of any kind, including | ||
but not limited to diagnosis or prognosis. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# Important: Pip install is different for CPU versus GPU | ||
|
||
If you're on a GPU machine: | ||
|
||
```bash | ||
pip install deepconsensus[gpu]==0.2.0 | ||
# To make sure the `deepconsensus` CLI works, set the PATH: | ||
export PATH="/home/${USER}/.local/bin:${PATH}" | ||
``` | ||
|
||
If you're on a CPU machine: | ||
|
||
```bash | ||
pip install deepconsensus[cpu]==0.2.0 | ||
# To make sure the `deepconsensus` CLI works, set the PATH: | ||
export PATH="/home/${USER}/.local/bin:${PATH}" | ||
``` | ||
|
||
## Documentation, quick start, citation | ||
|
||
All other documentation is on GitHub: [https://github.com/google/deepconsensus](https://github.com/google/deepconsensus). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Copyright (c) 2021, Google Inc. | ||
# All rights reserved. | ||
# | ||
# Redistribution and use in source and binary forms, with or without modification, | ||
# are permitted provided that the following conditions are met: | ||
# | ||
# 1. Redistributions of source code must retain the above copyright notice, this | ||
# list of conditions and the following disclaimer. | ||
# | ||
# 2. Redistributions in binary form must reproduce the above copyright notice, | ||
# this list of conditions and the following disclaimer in the documentation | ||
# and/or other materials provided with the distribution. | ||
# | ||
# 3. Neither the name of Google Inc. nor the names of its contributors | ||
# may be used to endorse or promote products derived from this software without | ||
# specific prior written permission. | ||
# | ||
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND | ||
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | ||
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | ||
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR | ||
# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES | ||
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; | ||
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON | ||
# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | ||
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
"""__init__.py.""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
# Copyright (c) 2021, Google Inc. | ||
# All rights reserved. | ||
# | ||
# Redistribution and use in source and binary forms, with or without modification, | ||
# are permitted provided that the following conditions are met: | ||
# | ||
# 1. Redistributions of source code must retain the above copyright notice, this | ||
# list of conditions and the following disclaimer. | ||
# | ||
# 2. Redistributions in binary form must reproduce the above copyright notice, | ||
# this list of conditions and the following disclaimer in the documentation | ||
# and/or other materials provided with the distribution. | ||
# | ||
# 3. Neither the name of Google Inc. nor the names of its contributors | ||
# may be used to endorse or promote products derived from this software without | ||
# specific prior written permission. | ||
# | ||
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND | ||
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | ||
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | ||
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR | ||
# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES | ||
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; | ||
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON | ||
# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | ||
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
# pylint: disable=g-short-docstring-punctuation | ||
"""DeepConsensus | ||
Usage: | ||
deepconsensus <command> [optional arguments] | ||
Commands: | ||
preprocess: Convert aligned subreads to tf.Example format. | ||
run: Run DeepConsenseus beginning with aligned subreads. | ||
""" | ||
|
||
import argparse | ||
import sys | ||
import textwrap | ||
|
||
from absl import app | ||
from absl.flags import argparse_flags | ||
|
||
from deepconsensus.utils import dc_constants | ||
|
||
COMMANDS = ['preprocess', 'run'] | ||
|
||
|
||
def parse_flags(argv): | ||
parser = argparse_flags.ArgumentParser(add_help=False, usage=__doc__) | ||
parser.add_argument('command', choices=COMMANDS, help=argparse.SUPPRESS) | ||
parser.add_argument( | ||
'--version', action='version', version=dc_constants.__version__) | ||
return parser.parse_known_args(argv[1:]) | ||
|
||
|
||
def handle_help(passed, module): | ||
"""Print a better help screen for subcommands.""" | ||
if '-h' in passed or '--help' in passed or len(sys.argv) == 2: | ||
flag_set = module.flags.FLAGS.flags_by_module_dict()[module.__name__] | ||
print(module.__doc__, file=sys.stderr) | ||
flag_help = [] | ||
for flag in flag_set: | ||
out = f' --{flag.name:<20} {flag.help}' | ||
if flag.default: | ||
out += f' [default: {flag.default}]' | ||
flag_help.append(out) | ||
print('Flags:', file=sys.stderr) | ||
for flag in flag_help: | ||
print( | ||
textwrap.fill( | ||
flag, | ||
width=80, | ||
subsequent_indent=' ' * 27, | ||
fix_sentence_endings=True), | ||
file=sys.stderr) | ||
print('', file=sys.stderr) | ||
print('Requirements:', file=sys.stderr) | ||
for flag in flag_set: | ||
for validator in flag.validators: | ||
print(' ' + validator.message, file=sys.stderr) | ||
# Print help and exit. | ||
exit(0) | ||
|
||
|
||
def main(argset): | ||
args, passed = argset # Ignore unused args; These are passed to subcommands. | ||
if args.command: | ||
passed = [args.command] + passed | ||
if args.command == 'preprocess': | ||
from deepconsensus.preprocess import preprocess | ||
preprocess.register_required_flags() | ||
handle_help(passed, preprocess) | ||
app.run(preprocess.main, argv=passed) | ||
elif args.command == 'run': | ||
from deepconsensus.inference import quick_inference | ||
quick_inference.register_required_flags() | ||
handle_help(passed, quick_inference) | ||
app.run(quick_inference.main, argv=passed) | ||
|
||
|
||
def run(): | ||
app.run(main, flags_parser=parse_flags) | ||
|
||
|
||
if __name__ == '__main__': | ||
run() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Copyright (c) 2021, Google Inc. | ||
# All rights reserved. | ||
# | ||
# Redistribution and use in source and binary forms, with or without modification, | ||
# are permitted provided that the following conditions are met: | ||
# | ||
# 1. Redistributions of source code must retain the above copyright notice, this | ||
# list of conditions and the following disclaimer. | ||
# | ||
# 2. Redistributions in binary form must reproduce the above copyright notice, | ||
# this list of conditions and the following disclaimer in the documentation | ||
# and/or other materials provided with the distribution. | ||
# | ||
# 3. Neither the name of Google Inc. nor the names of its contributors | ||
# may be used to endorse or promote products derived from this software without | ||
# specific prior written permission. | ||
# | ||
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND | ||
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | ||
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | ||
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR | ||
# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES | ||
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; | ||
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON | ||
# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | ||
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
"""__init__.py.""" |
Oops, something went wrong.