Block or Report
Block or report nangongmujd
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Multilingual Voice Understanding Model
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
一个简单的本地网页界面,使用ChatTTS将文字合成为语音,同时支持对外提供API接口。A simple native web interface that uses ChatTTS to synthesize text into speech, along with support for external API interfaces.
Core Engine of Singing Voice Conversion & Singing Voice Clone
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
A generative speech model for daily dialogue.
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising
PyTorch implementation of "StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator"
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
Foundational model for human-like, expressive TTS
Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (arXiv:2401.01498)
✨✨Latest Advances on Multimodal Large Language Models
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
这个项目是数据预处理。第一步是对获取到的音频做处理,结合Funasr的时间戳去掉空背景音。也包含了喂给BERT前的label
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
GeneFace++: Generalized and Stable Real-Time 3D Talking Face Generation; Official Code
Multi-Speaker Pytorch FastSpeech2: Fast and High-Quality End-to-End Text to Speech ✊
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
A database of head-related transfer functions created by the Center for Image Processing and Integrated Computing at the University of California in 2001.
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
Foundational Models for State-of-the-Art Speech and Text Translation
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models