train_model.md

Training a DeepConsensus Model

This tutorial demonstrates how to train DeepConsensus locally using GPUs. This tutorial is designed to be a starting point from which a more advanced training pipeline could be developed. Setting up a more advanced training pipeline will likely be necessary in order to train highly accurate DeepConsensus models.

Training examples can be created with DeepConsensus. See the Generate Training Examples for details. For this tutorial we will use a small toy data set.

These instructions were tested on Google Cloud VM with Intel 16 cores (n1-standard-16) and 1 x NVIDIA Tesla P100 GPU.

Clone DeepConsensus source code.

git clone https://github.com/google/deepconsensus.git

Run the GPU installation script

cd deepconsensus
./install-gpu.sh

Install CUDA drivers

curl https://raw.githubusercontent.com/google/deepvariant/r1.4/scripts/install_nvidia_docker.sh -o install_nvidia_docker.sh
bash install_nvidia_docker.sh

Copy training examples to the local directory

export DC_TRAIN_DIR="${HOME}/dc-model"
export TF_EXAMPLES="${DC_TRAIN_DIR}/tf-examples"
export DC_TRAIN_OUTPUT="${DC_TRAIN_DIR}/output"

mkdir "${DC_TRAIN_DIR}"
mkdir "${TF_EXAMPLES}"
mkdir "${DC_TRAIN_OUTPUT}"

gsutil -m cp -R gs://brain-genomics-public/research/deepconsensus/training-tutorial/v1.0/* "${TF_EXAMPLES}/"

The path to training examples has to be set in deepconsensus/models/model_configs.py in _set_custom_data_hparams function.

For example, if training data is located in /home/user/dc-model/tf-examples the config will look like this:

def _set_custom_data_hparams(params):
  """Updates the given config with values for human data aligned to CCS."""
  params.tf_dataset = ['/home/user/dc-model/tf-examples']
  params.max_passes = 20

It is assumed that there are following subdirectories in this path that contain TensorFlow examples generated by DeepConsensus:

train
eval
test

The directory also contains summary.training.json file.

Launch training script

export PYTHONPATH=$PWD:$PYTHONPATH
export CONFIG=deepconsensus/models/model_configs.py:transformer_learn_values+custom
python3 deepconsensus/models/model_train_custom_loop.py --params ${CONFIG} --out_dir ${DC_TRAIN_OUTPUT} --alsologtostderr

Launch training from a checkpoint (warm-start training)

Beginning training from an existing model checkpoint will generally speed up most training sessions. In order to start from a checkpoint add --checkpoint parameter that points to the path + prefix of the checkpoint.

Runtime

By default, training will run for 4 epochs. Batch size is set to 256 by default, but this is scaled based on the number of GPUs or TPUs available. These values can be configured by updating the model_configs.py file.

The number of steps in an epoch = <number of examples> / <batch_size>. Once the training is finished the $DC_TRAIN_OUTPUT/best_checkpoint.txt file will contain the best performing checkpoint. This checkpoint can then be used during inference.

Run Inference

The newly trained checkoint can be used to run the inference with DeepConsensus. See Run DeepConsensus for the detailed instructions on how to run DeepConsensus.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train_model.md

train_model.md

Training a DeepConsensus Model

Clone DeepConsensus source code.

Run the GPU installation script

Install CUDA drivers

Copy training examples to the local directory

Launch training script

Launch training from a checkpoint (warm-start training)

Runtime

Run Inference

Files

train_model.md

Latest commit

History

train_model.md

File metadata and controls

Training a DeepConsensus Model

Clone DeepConsensus source code.

Run the GPU installation script

Install CUDA drivers

Copy training examples to the local directory

Launch training script

Launch training from a checkpoint (warm-start training)

Runtime

Run Inference