GitHub - kotoba-tech/Open-GPT-4o

Open-GPT-4o: Democratizing Speech Foundation Models for All

We develop Open-GPT-4o with a focus on speech (audio) foundation models, receiving inspiration from the Open-Sora initiative. We will make models and training details accessible to everyone. Open-GPT-4o will open up an avenue for contributions from the open-source community and make technologies available in non-English languages (e.g., Asian languages like Japanese).

Under development. Currently we have building blocks: audio tokenization, voice cloning, text-to-speech, and fast ASR.

📰 News

[2024.05.15] Launching Open-GPT-4o.
[~2024.05.15] Kotoba Technologies open-source preliminary building blocks towards Open-GPT-4o: Kotoba-Speech (a state-of-the-art Japanese voice cloning and text-to-speech model) and Kotoba-Whisper (fast and accurate Japanese ASR, 6.3x faster than OpenAI Whisper-large with competitive performance).

Latest Demo (To Be Updated)

We plan to create many demos using our foundation models. As a first step, we provide 🤗 Kotoba-Speech Demo on HuggingFace and 🤗 Kotoba-Whisper Demo on HuggingFace. These serve as building blocks for creating speech foundation models.

New Features/Updates

TBA

TODO list

Installation

Open-GPT-4o Model Weights

Data Processing

Training

Open-GPT-4o Training

Evaluation

Contribution

References

Jungo's quick tutorial video on audio tokenization

Acknowledgement

MetaVoice: An autoregressive (+non-autoregressive part) voice cloning/TTS model for English.
audiocraft: Speech/audio tokenization model for preprocessing and generation.
llama recipes: distirubted training library that supports llama's continuous training. We are grateful for the amazing open-sourcing community.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets/readme		assets/readme
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open-GPT-4o: Democratizing Speech Foundation Models for All

📰 News

Latest Demo (To Be Updated)

New Features/Updates

TODO list

Contents

Installation

Open-GPT-4o Model Weights

Data Processing

Training

Open-GPT-4o Training

Evaluation

Contribution

References

Acknowledgement

About

Releases

Packages

License

kotoba-tech/Open-GPT-4o

Folders and files

Latest commit

History

Repository files navigation

Open-GPT-4o: Democratizing Speech Foundation Models for All

📰 News

Latest Demo (To Be Updated)

New Features/Updates

TODO list

Contents

Installation

Open-GPT-4o Model Weights

Data Processing

Training

Open-GPT-4o Training

Evaluation

Contribution

References

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages