A (Heavily Documented) TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

Warning

As of May 17, 2017, this is still a first draft. You can run it following the steps below, but probably you should get poor results. I'll be working on debugging this weekend. (Code reviews and/or contributions are more than welcome!)

Requirements

NumPy >= 1.11.1
TensorFlow >= 1.0
librosa

Data

Since the original paper was based on their internal data, I use a freely available one, instead.

The World English Bible is a public domain update of the American Standard Version of 1901 into modern English. Its text and audio recordings are freely avaiable here. Unfortunately, however, each of the audio files matches a chapter, not a verse, so is too long in most cases. I sliced them by verse manually. You can get them on my dropbox

Work Flow

STEP 1. Adjust hyper parameters in hyperparams.py if necessary.
STEP 2. Download the data and extract it.
STEP 3. Run train.py.
STEP 4. Run eval.py to get samples.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md
data.py		data.py
eval.py		eval.py
hyperparams.py		hyperparams.py
modules.py		modules.py
networks.py		networks.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A (Heavily Documented) TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

Warning

Requirements

Data

Work Flow

About

Releases

Packages

Languages

License

xuerq/tacotron

Folders and files

Latest commit

History

Repository files navigation

A (Heavily Documented) TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

Warning

Requirements

Data

Work Flow

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages