Block or Report
Block or report missaaoo
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing [ISMIR 2024]
UT-Sarulab MOS prediction system using SSL models
Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch
a state-of-the-art-level open visual language model | 多模态预训练模型
Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11037
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
Implementation of a multimodal diffusion transformer in Pytorch
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation
A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models
Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers
Official code of paper "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501
Multilingual and Controllable Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart.
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Have a natural voice conversation with an LLM
Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).
Official Demo Page for DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer
Generative models for conditional audio generation
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
Fine tuning the UnifiedVoice autoregressor for TortoiseTTS.
DLAS - A configuration-driven trainer for generative models
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
This is the official repository for M2UGen
official code for Diff-Instruct algorithm for one-step diffusion distillation