Block or Report
Block or report hongwen-sun
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
AI powered speech denoising and enhancement
Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
Karras et al. (2022) diffusion models for PyTorch
Robust Singing Voice Transcription and MIDI Extraction
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
Multilingual Voice Understanding Model
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers
Train transformer language models with reinforcement learning.
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音
一个简单的本地网页界面,使用ChatTTS将文字合成为语音,同时支持对外提供API接口。A simple native web interface that uses ChatTTS to synthesize text into speech, along with support for external API interfaces.
Bark Voice Cloning and Voice Cloning for Chinese Speech
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Inference and training library for high-quality TTS models.
A generative speech model for daily dialogue.
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
Contains the code associated with the ICLR submission for our text-to-speech diffusion model
OpenGPT 4o is a free alternative to OpenAI GPT 4o