[go: nahoru, domu]

Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
akolesnikov committed Mar 5, 2022
0 parents commit e615765
Show file tree
Hide file tree
Showing 73 changed files with 9,669 additions and 0 deletions.
25 changes: 25 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# How to Contribute

We are not accepting external pull requests at this time.

## Contributor License Agreement

Contributions to this project must be accompanied by a Contributor License
Agreement (CLA). You (or your employer) retain the copyright to your
contribution; this simply gives us permission to use and redistribute your
contributions as part of the project. Head over to
<https://cla.developers.google.com/> to see your current agreements on file or
to sign a new one.

You generally only need to submit a CLA once, so if you've already submitted one
(even if it was for a different project), you probably don't need to do it
again.

## Code Reviews

We are not accepting external pull requests at this time.

## Community Guidelines

This project follows
[Google's Open Source Community Guidelines](https://opensource.google/conduct/).
56 changes: 56 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Build with:
# sudo docker build -t deepconsensus .
# For GPU:
# sudo docker build --build-arg build_gpu=true --build-arg FROM_IMAGE=nvidia/cuda:11.3.0-cudnn8-runtime -t deepconsensus_gpu .



ARG FROM_IMAGE=continuumio/miniconda3

FROM continuumio/miniconda3 as conda_setup
RUN conda config --add channels defaults && \
conda config --add channels bioconda && \
conda config --add channels conda-forge
RUN conda create -n bio \
python=3.8 \
pbcore \
pbbam \
pbccs \
pbmm2 \
parallel \
jq \
gcc \
pycocotools \
bioconda::seqtk \
bioconda::unimap \
bioconda::bedtools \
bioconda::minimap2 \
bioconda::extracthifi \
bioconda::zmwfilter \
bioconda::pysam \
bioconda::samtools=1.10 \
bioconda::pyfastx=0.8.4 \
&& conda clean -a
RUN wget https://github.com/PacificBiosciences/align-clr-to-ccs/releases/download/0.2.0/actc && \
chmod +x actc && \
mv actc /opt/conda/bin/

FROM ${FROM_IMAGE} as builder
COPY --from=conda_setup /opt/conda /opt/conda

ENV PATH=/opt/conda/envs/bio/bin:/opt/conda/bin:"${PATH}"
ENV LD_LIBRARY_PATH=/opt/conda/envs/bio/lib:/opt/mytools/lib/x86_64-linux-gnu:"${LD_LIBRARY_PATH}"

COPY . /opt/deepconsensus
WORKDIR /opt/deepconsensus
ARG build_gpu
RUN if [ "${_TAG_NAME}" = "*gpu" ] || [ "${build_gpu}" = "true" ]; then \
echo "Installing deepconsensus[gpu] version"; \
pip install .[gpu]; \
else \
echo "Installing deepconsensus[cpu] version"; \
pip install .[cpu]; \
fi

CMD ["deepconsensus"]

27 changes: 27 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
Copyright (c) 2021, Google Inc.
All rights reserved.

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

3. Neither the name of Google Inc. nor the names of its contributors
may be used to endorse or promote products derived from this software without
specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
86 changes: 86 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# DeepConsensus

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific
Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.

![DeepConsensus overview diagram](https://raw.githubusercontent.com/google/deepconsensus/main/docs/images/pipeline_figure.png)

## Installation

### From pip package

If you're on a GPU machine:

```bash
pip install deepconsensus[gpu]==0.2.0
# To make sure the `deepconsensus` CLI works, set the PATH:
export PATH="/home/${USER}/.local/bin:${PATH}"
```

If you're on a CPU machine:

```bash
pip install deepconsensus[cpu]==0.2.0
# To make sure the `deepconsensus` CLI works, set the PATH:
export PATH="/home/${USER}/.local/bin:${PATH}"
```

### From Docker image

For GPU:

```bash
sudo docker pull google/deepconsensus:0.2.0-gpu
```

For CPU:

```bash
sudo docker pull google/deepconsensus:0.2.0
```

### From source

```bash
git clone https://github.com/google/deepconsensus.git
cd deepconsensus
source install.sh
```

If you have GPU, run `source install-gpu.sh` instead. Currently the only
difference is that the GPU version installs `tensorflow-gpu` instead of
`intel-tensorflow`.

(Optional) After `source install.sh`, if you want to run all unit tests, you can
do:

```bash
./run_all_tests.sh
```

## Usage

See the [quick start](https://github.com/google/deepconsensus/blob/main/docs/quick_start.md).

## Where does DeepConsensus fit into my pipeline?

After a PacBio sequencing run, DeepConsensus is meant to be run on the subreads
to create new corrected reads in FASTQ format that can take the place of the CCS
reads for downstream analyses.

See the [quick start](https://github.com/google/deepconsensus/blob/main/docs/quick_start.md)
for an example of inputs and outputs.

## How to cite

If you are using DeepConsensus in your work, please cite:

[DeepConsensus: Gap-Aware Sequence Transformers for Sequence Correction](https://www.biorxiv.org/content/10.1101/2021.08.31.458403v1)

## Disclaimer

This is not an official Google product.

NOTE: the content of this research code repository (i) is not intended to be a
medical device; and (ii) is not intended for clinical use of any kind, including
but not limited to diagnosis or prognosis.
21 changes: 21 additions & 0 deletions README_pip.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Important: Pip install is different for CPU versus GPU

If you're on a GPU machine:

```bash
pip install deepconsensus[gpu]==0.2.0
# To make sure the `deepconsensus` CLI works, set the PATH:
export PATH="/home/${USER}/.local/bin:${PATH}"
```

If you're on a CPU machine:

```bash
pip install deepconsensus[cpu]==0.2.0
# To make sure the `deepconsensus` CLI works, set the PATH:
export PATH="/home/${USER}/.local/bin:${PATH}"
```

## Documentation, quick start, citation

All other documentation is on GitHub: [https://github.com/google/deepconsensus](https://github.com/google/deepconsensus).
28 changes: 28 additions & 0 deletions deepconsensus/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Copyright (c) 2021, Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# 3. Neither the name of Google Inc. nor the names of its contributors
# may be used to endorse or promote products derived from this software without
# specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""__init__.py."""
110 changes: 110 additions & 0 deletions deepconsensus/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# Copyright (c) 2021, Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# 3. Neither the name of Google Inc. nor the names of its contributors
# may be used to endorse or promote products derived from this software without
# specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# pylint: disable=g-short-docstring-punctuation
"""DeepConsensus
Usage:
deepconsensus <command> [optional arguments]
Commands:
preprocess: Convert aligned subreads to tf.Example format.
run: Run DeepConsenseus beginning with aligned subreads.
"""

import argparse
import sys
import textwrap

from absl import app
from absl.flags import argparse_flags

from deepconsensus.utils import dc_constants

COMMANDS = ['preprocess', 'run']


def parse_flags(argv):
parser = argparse_flags.ArgumentParser(add_help=False, usage=__doc__)
parser.add_argument('command', choices=COMMANDS, help=argparse.SUPPRESS)
parser.add_argument(
'--version', action='version', version=dc_constants.__version__)
return parser.parse_known_args(argv[1:])


def handle_help(passed, module):
"""Print a better help screen for subcommands."""
if '-h' in passed or '--help' in passed or len(sys.argv) == 2:
flag_set = module.flags.FLAGS.flags_by_module_dict()[module.__name__]
print(module.__doc__, file=sys.stderr)
flag_help = []
for flag in flag_set:
out = f' --{flag.name:<20} {flag.help}'
if flag.default:
out += f' [default: {flag.default}]'
flag_help.append(out)
print('Flags:', file=sys.stderr)
for flag in flag_help:
print(
textwrap.fill(
flag,
width=80,
subsequent_indent=' ' * 27,
fix_sentence_endings=True),
file=sys.stderr)
print('', file=sys.stderr)
print('Requirements:', file=sys.stderr)
for flag in flag_set:
for validator in flag.validators:
print(' ' + validator.message, file=sys.stderr)
# Print help and exit.
exit(0)


def main(argset):
args, passed = argset # Ignore unused args; These are passed to subcommands.
if args.command:
passed = [args.command] + passed
if args.command == 'preprocess':
from deepconsensus.preprocess import preprocess
preprocess.register_required_flags()
handle_help(passed, preprocess)
app.run(preprocess.main, argv=passed)
elif args.command == 'run':
from deepconsensus.inference import quick_inference
quick_inference.register_required_flags()
handle_help(passed, quick_inference)
app.run(quick_inference.main, argv=passed)


def run():
app.run(main, flags_parser=parse_flags)


if __name__ == '__main__':
run()
28 changes: 28 additions & 0 deletions deepconsensus/inference/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Copyright (c) 2021, Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# 3. Neither the name of Google Inc. nor the names of its contributors
# may be used to endorse or promote products derived from this software without
# specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""__init__.py."""
Loading

0 comments on commit e615765

Please sign in to comment.