[go: nahoru, domu]

Skip to content
View missaaoo's full-sized avatar
Block or Report

Block or report missaaoo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 560 43 Updated Jul 6, 2024

MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing [ISMIR 2024]

Python 20 Updated Jul 3, 2024

汉字转拼音(pypinyin)

Python 4,756 603 Updated Mar 10, 2024

UT-Sarulab MOS prediction system using SSL models

Python 143 14 Updated Apr 11, 2024

Audio Large Language Models

6 2 Updated Jun 26, 2024

Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch

Python 1,187 124 Updated May 3, 2024

a state-of-the-art-level open visual language model | 多模态预训练模型

Python 5,604 390 Updated May 29, 2024

Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11037

Python 36 2 Updated Jul 2, 2024

Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.

Python 62 2 Updated Jun 21, 2024

Implementation of a multimodal diffusion transformer in Pytorch

90 Updated Jun 24, 2024

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Python 135 9 Updated Jul 6, 2024

This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation

Jupyter Notebook 242 7 Updated Jul 3, 2024

A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models

Python 1,725 251 Updated Jun 12, 2023

Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers

Python 48 3 Updated Jul 3, 2024

Official code of paper "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501

Python 21 3 Updated Jun 4, 2024

Multilingual and Controllable Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart.

Python 1,215 137 Updated Jul 5, 2024

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Python 2,681 197 Updated Jul 4, 2024

Have a natural voice conversation with an LLM

Python 39 12 Updated Jul 3, 2024

AudioSR-Upsampling (any -> 48kHz)

Python 35 2 Updated Feb 13, 2024

Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).

Python 38 6 Updated Jun 21, 2024

Official Demo Page for DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer

HTML 25 Updated Jul 3, 2024

Generative models for conditional audio generation

Python 2,265 199 Updated Jul 2, 2024

Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)

Python 123 11 Updated Sep 14, 2023

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

94 1 Updated Jun 13, 2024

Fine tuning the UnifiedVoice autoregressor for TortoiseTTS.

Python 15 Updated Nov 25, 2023

DLAS - A configuration-driven trainer for generative models

Python 129 121 Updated Oct 11, 2022

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 362 32 Updated Jun 9, 2024

This is the official repository for M2UGen

Jupyter Notebook 421 38 Updated May 8, 2024

official code for Diff-Instruct algorithm for one-step diffusion distillation

Python 38 3 Updated Apr 6, 2024
Next