Update version numbers for v1.2.

PiperOrigin-RevId: 510288078
google · Feb 17, 2023 · 77798d6 · 77798d6
1 parent f0f5dd0
commit 77798d6
Show file tree

Hide file tree

Showing 10 changed files with 43 additions and 33 deletions.
diff --git a/README.md b/README.md
@@ -41,13 +41,13 @@ For context, we are the team that created and maintains both DeepConsensus and
 DeepVariant. For variant calling with DeepVariant, we tested different models
 and found that the best performance is with DeepVariant v1.4 using the normal
 pacbio model rather than the model trained on DeepConsensus v0.1 output. We plan
-to include DeepConsensus v1.1 outputs when training the next DeepVariant model,
+to include DeepConsensus v1.2 outputs when training the next DeepVariant model,
 so if there is a DeepVariant version later than v1.4 when you read this, we
 recommend using that latest version.
 
 ### For assembly downstream
 
-We have confirmed that v1.1 outperforms v0.3 in terms of downstream assembly
+We have confirmed that v1.2 outperforms v0.3 in terms of downstream assembly
 contiguity and accuracy. See the
 [assembly metrics page](docs/assembly_metrics.md) for details.
 
@@ -76,15 +76,15 @@ to inspect some example model inputs and outputs.
 If you're on a GPU machine:
 
 ```bash
-pip install deepconsensus[gpu]==1.1.0
+pip install deepconsensus[gpu]==1.2.0
 # To make sure the `deepconsensus` CLI works, set the PATH:
 export PATH="/home/${USER}/.local/bin:${PATH}"
 ```
 
 If you're on a CPU machine:
 
 ```bash
-pip install deepconsensus[cpu]==1.1.0
+pip install deepconsensus[cpu]==1.2.0
 # To make sure the `deepconsensus` CLI works, set the PATH:
 export PATH="/home/${USER}/.local/bin:${PATH}"
 ```
@@ -94,13 +94,13 @@ export PATH="/home/${USER}/.local/bin:${PATH}"
 For GPU:
 
 ```bash
-sudo docker pull google/deepconsensus:1.1.0-gpu
+sudo docker pull google/deepconsensus:1.2.0-gpu
 ```
 
 For CPU:
 
 ```bash
-sudo docker pull google/deepconsensus:1.1.0
+sudo docker pull google/deepconsensus:1.2.0
 ```
 
 ### From source

diff --git a/README_pip.md b/README_pip.md
@@ -3,15 +3,15 @@
 If you're on a GPU machine:
 
 ```bash
-pip install deepconsensus[gpu]==1.1.0
+pip install deepconsensus[gpu]==1.2.0
 # To make sure the `deepconsensus` CLI works, set the PATH:
 export PATH="/home/${USER}/.local/bin:${PATH}"
 ```
 
 If you're on a CPU machine:
 
 ```bash
-pip install deepconsensus[cpu]==1.1.0
+pip install deepconsensus[cpu]==1.2.0
 # To make sure the `deepconsensus` CLI works, set the PATH:
 export PATH="/home/${USER}/.local/bin:${PATH}"
 ```

diff --git a/deepconsensus/testdata/human_1m/tf_examples/summary/summary.inference.json b/deepconsensus/testdata/human_1m/tf_examples/summary/summary.inference.json
@@ -19,5 +19,5 @@
  "truth_bed": "None",
  "truth_split": "None",
  "ins_trim": "5",
- "version": "1.1.0"
+ "version": "1.2.0"
 }
diff --git a/deepconsensus/testdata/human_1m/tf_examples/summary/summary.training.json b/deepconsensus/testdata/human_1m/tf_examples/summary/summary.training.json
@@ -26,5 +26,5 @@
  "truth_bed": "testdata/human_1m/truth.bed",
  "truth_split": "testdata/human_1m/truth_split.tsv",
  "ins_trim": "5",
- "version": "1.1.0"
+ "version": "1.2.0"
 }
diff --git a/deepconsensus/testdata/human_1m/tf_examples_bq/summary/summary.training.json b/deepconsensus/testdata/human_1m/tf_examples_bq/summary/summary.training.json
@@ -26,5 +26,5 @@
  "truth_bed": "testdata/human_1m/truth.bed",
  "truth_split": "testdata/human_1m/truth_split.tsv",
  "ins_trim": "5",
- "version": "1.1.0"
+ "version": "1.2.0"
 }
diff --git a/deepconsensus/utils/dc_constants.py b/deepconsensus/utils/dc_constants.py
@@ -33,7 +33,7 @@
 import tensorflow as tf
 
 # DeepConsensus Version
-__version__ = '1.1.0'
+__version__ = '1.2.0'
 
 # Vocab
 GAP = ' '

diff --git a/docs/generate_examples.md b/docs/generate_examples.md
@@ -58,7 +58,7 @@ mkdir "${TF_EXAMPLES_DIR}/eval"
 mkdir "${TF_EXAMPLES_DIR}/test"
 
 # Download the input PacBio Subread data.
-gsutil cp gs://brain-genomics-public/research/deepconsensus/quickstart/v1.1/n1000.subreads.bam "${BASE_DIR}"/
+gsutil cp gs://brain-genomics-public/research/deepconsensus/quickstart/v1.2/n1000.subreads.bam "${BASE_DIR}"/
 
 # Truth Reference
 gsutil cp gs://deepconsensus/pacbio/datasets/chm13/chm13v2.0_noY.fa "${BASE_DIR}"/
@@ -159,7 +159,7 @@ https://docs.docker.com/engine/install/ubuntu/ to install Docker.
 
 ```bash
 # Define DOCKER_IMAGE *once* depending on whether you will be using CPU or GPU:
-DOCKER_IMAGE=google/deepconsensus:1.1.0  # For CPU
+DOCKER_IMAGE=google/deepconsensus:1.2.0  # For CPU
 sudo docker pull ${DOCKER_IMAGE}
 ```
 
@@ -313,6 +313,9 @@ export truth_reference=chm13v2.0_noY.fa
 export ccs_shard_bam="${shard_id}.ccs.bam"
 export truth_split=chm13v2.0_noY.chrom_mapping.txt
 export subreads_to_ccs_shard_bam="${shard_id}.subreads_to_ccs.bam"
+# If true, incorporate CCS Base Quality scores into tf.examples (DC v1.2).
+export use_ccs_bq=True
+
 # Output
 TF_EXAMPLES_DIR="tf_examples"
 export ccs_shard_to_truth_alignment_unfiltered="${shard_id}.ccs_to_truth_ref.unfiltered.bam"
@@ -392,6 +395,7 @@ deepconsensus preprocess \
       --truth_bed="${truth_shard_bed}" \
       --truth_to_ccs="${truth_to_ccs_shard_bam}" \
       --truth_split="${truth_split}" \
+      --use_ccs_bq="${use_ccs_bq}" \
       --output="${tf_example_fname_output}" \
       --cpus="$(nproc)"
 

diff --git a/docs/quick_start.md b/docs/quick_start.md
@@ -68,10 +68,10 @@ Follow https://docs.docker.com/engine/install/ubuntu/ to install Docker.
 
 ## Parallelization
 
-One 8M SMRT Cell can take ~1000 hours to run (without parallelization) depending
+One 8M SMRT Cell can take ~500 hours to run (without parallelization) depending
 on the fragment lengths of the sequencing library - see the
 [yield metrics page](yield_metrics.md). If we split this into 500 shards, that
-is about 2 hours per shard. There is some variability between shards, but this
+is about 1 hour per shard. There is some variability between shards, but this
 should give you an idea of what to expect. This estimate is only for the
 DeepConsensus processing step, and does not include the preprocessing required
 with *ccs* and *actc*.
@@ -100,10 +100,10 @@ QS_DIR="${HOME}/deepconsensus_quick_start"
 mkdir -p "${QS_DIR}" "${QS_DIR}/model"
 
 # Download the input PacBio Subread data.
-gsutil cp gs://brain-genomics-public/research/deepconsensus/quickstart/v1.1/n1000.subreads.bam "${QS_DIR}"/
+gsutil cp gs://brain-genomics-public/research/deepconsensus/quickstart/v1.2/n1000.subreads.bam "${QS_DIR}"/
 
 # Download the DeepConsensus model.
-gsutil cp -r gs://brain-genomics-public/research/deepconsensus/models/v1.1/model_checkpoint/* "${QS_DIR}"/model/
+gsutil cp -r gs://brain-genomics-public/research/deepconsensus/models/v1.2/model_checkpoint/* "${QS_DIR}"/model/
 ```
 
 This directory should now contain the following files:
@@ -133,8 +133,8 @@ the appropriate version (CPU / GPU) depending on your use case.
 
 ```bash
 # Define DOCKER_IMAGE *once* depending on whether you will be using CPU or GPU:
-DOCKER_IMAGE=google/deepconsensus:1.1.0  # For CPU
-DOCKER_IMAGE=google/deepconsensus:1.1.0-gpu  # For GPU
+DOCKER_IMAGE=google/deepconsensus:1.2.0  # For CPU
+DOCKER_IMAGE=google/deepconsensus:1.2.0-gpu  # For GPU
 sudo docker pull ${DOCKER_IMAGE}
 ```
 

diff --git a/docs/train_model.md b/docs/train_model.md
@@ -43,7 +43,7 @@ mkdir "${DC_TRAIN_DIR}"
 mkdir "${TF_EXAMPLES}"
 mkdir "${DC_TRAIN_OUTPUT}"
 
-gsutil -m cp -R gs://brain-genomics-public/research/deepconsensus/training-tutorial/v1.0/* "${TF_EXAMPLES}/"
+gsutil -m cp -R gs://brain-genomics-public/research/deepconsensus/training-tutorial/v1.2/* "${TF_EXAMPLES}/"
 ```
 
 The path to training examples has to be set in
@@ -52,11 +52,14 @@ The path to training examples has to be set in
 For example, if training data is located in /home/user/dc-model/tf-examples the
 config will look like this:
 
-```
+```python
 def _set_custom_data_hparams(params):
   """Updates the given config with values for human data aligned to CCS."""
   params.tf_dataset = ['/home/user/dc-model/tf-examples']
   params.max_passes = 20
+  # Set this to True if the tf examples contain ccs base quality scores.
+  # Option available starting in v1.2.
+  params.use_ccs_bq = True
 
 ```
 
@@ -85,7 +88,7 @@ parameter that points to the path + prefix of the checkpoint.
 
 ## Runtime
 
-By default, training will run for 4 epochs. Batch size is set to 256 by default,
+By default, training will run for 9 epochs. Batch size is set to 256 by default,
 but this is scaled based on the number of GPUs or TPUs available. These values
 can be configured by updating the `model_configs.py` file.
 

diff --git a/docs/train_tpu_model.md b/docs/train_tpu_model.md
@@ -98,7 +98,7 @@ Then, I copied over a dataset:
 BASE_DIR=/mnt/disks/persist/dc_training_examples
 mkdir -p ${BASE_DIR}/tf_examples
 time gcloud alpha storage cp -R \
-gs://brain-genomics-public/research/deepconsensus/training-tutorial/v1.1/* \
+gs://brain-genomics-public/research/deepconsensus/training-tutorial/v1.2/* \
 ${BASE_DIR}/tf_examples
 ```
 
@@ -138,7 +138,7 @@ Get a Cloud TPU VM (`--accelerator-type=v2-8` specifies Cloud TPU v2):
 gcloud compute tpus tpu-vm create ${USER}-tpu-name \
 --zone=${ZONE} \
 --accelerator-type=v2-8 \
---version=tpu-vm-tf-2.11.0 \
+--version=tpu-vm-tf-2.9.1 \
 --project ${PROJECT} \
 --data-disk source=projects/${PROJECT}/zones/${ZONE}/disks/${USER}-tpu-disk,mode=read-write
 ```
@@ -167,7 +167,7 @@ git clone https://github.com/google/deepconsensus.git
 
 ```
 cd deepconsensus
-sed -i -e 's|python3 -m pip install --user "intel-tensorflow>=2.11.0"||' install.sh
+sed -i -e 's|python3 -m pip install --user "intel-tensorflow==2.9.1"||' install.sh
 ./install.sh
 ```
 
@@ -209,6 +209,9 @@ def _set_custom_data_hparams(params):
   # confusing.
   params.n_examples_train = 100_000_000
   params.n_examples_eval = 3_500_000
+  # Set this to True if the tf examples contain ccs base quality scores.
+  # Option available starting in v1.2.
+  params.use_ccs_bq = True
 ```
 
 It is assumed that after copying training examples the
@@ -242,22 +245,22 @@ time python3 deepconsensus/models/model_train_custom_loop.py \
 
 Beginning training from an existing model checkpoint will generally speed up
 most training sessions. In order to start from a checkpoint add `--checkpoint`
-parameter that points to the path + prefix of the checkpoint. DeepConsensus v1.1
+parameter that points to the path + prefix of the checkpoint. DeepConsensus v1.2
 checkpoint can be copied from
-`gs://brain-genomics-public/research/deepconsensus/models/v1.1/model_checkpoint`
+`gs://brain-genomics-public/research/deepconsensus/models/v1.2/model_checkpoint`
 This directory contains 3 files:
 
 ```
-gs://brain-genomics-public/research/deepconsensus/models/v1.1/model_checkpoint/checkpoint.data-00000-of-00001
-gs://brain-genomics-public/research/deepconsensus/models/v1.1/model_checkpoint/checkpoint.index
-gs://brain-genomics-public/research/deepconsensus/models/v1.1/model_checkpoint/params.json
+gs://brain-genomics-public/research/deepconsensus/models/v1.2/model_checkpoint/checkpoint.data-00000-of-00001
+gs://brain-genomics-public/research/deepconsensus/models/v1.2/model_checkpoint/checkpoint.index
+gs://brain-genomics-public/research/deepconsensus/models/v1.2/model_checkpoint/params.json
 ```
 
 Copy DeepConsensus checkpoint locally:
 
 ```bash
 mkdir /mnt/disks/persist/model_checkpoint
-gsutil cp gs://brain-genomics-public/research/deepconsensus/models/v1.1/model_checkpoint/* /mnt/disks/persist/model_checkpoint/
+gsutil cp gs://brain-genomics-public/research/deepconsensus/models/v1.2/model_checkpoint/* /mnt/disks/persist/model_checkpoint/
 ```
 
 Add optional `--checkpoint` flag to
@@ -284,7 +287,7 @@ I1026 05:48:32.895524 140202426203200 model_utils.py:271] Per-replica batch-size
 I1026 05:48:32.895847 140202426203200 model_utils.py:280] Global batch size is 8192
 ```
 
-By default, training will run for 7 epochs. Per-replica batch size and epochs
+By default, training will run for 9 epochs. Per-replica batch size and epochs
 can be configured by updating the `model_configs.py` file. Global batch size is
 scaled based on the TPU topology and number of cores you have available.