missaaoo

missaaoo

2 followers · 25 following

Block or Report

Block or report missaaoo

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Stars

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 560 43 Updated Jul 6, 2024

sanderwood / melodyt5

MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing [ISMIR 2024]

Python 20 Updated Jul 3, 2024

mozillazg / python-pinyin

汉字转拼音(pypinyin)

Python 4,756 603 Updated Mar 10, 2024

sarulab-speech / UTMOS22

UT-Sarulab MOS prediction system using SSL models

Python 143 14 Updated Apr 11, 2024

AudioLLMs / AudioLLM

Audio Large Language Models

6 2 Updated Jun 26, 2024

lucidrains / video-diffusion-pytorch

Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch

Python 1,187 124 Updated May 3, 2024

THUDM / CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

Python 5,604 390 Updated May 29, 2024

ShovalMessica / NAST

Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11037

Python 36 2 Updated Jul 2, 2024

haoheliu / SemantiCodec-inference

Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.

Python 62 2 Updated Jun 21, 2024

lucidrains / multimodal-dit-pytorch

Implementation of a multimodal diffusion transformer in Pytorch

90 Updated Jun 24, 2024

liutaocode / TTS-arxiv-daily

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Python 135 9 Updated Jul 6, 2024

bytedance / 1d-tokenizer

This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation

Jupyter Notebook 242 7 Updated Jul 3, 2024

thu-coai / CDial-GPT

A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models

Python 1,725 251 Updated Jun 12, 2023

Text-to-Audio / Make-An-Audio-3

Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers

Python 48 3 Updated Jul 3, 2024

Kevinz-code / SeVa

Official code of paper "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501

Python 21 3 Updated Jun 4, 2024

DigitalPhonetics / IMS-Toucan

Multilingual and Controllable Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart.

Python 1,215 137 Updated Jul 5, 2024

Tencent / HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Python 2,681 197 Updated Jul 4, 2024

Finity-Alpha / OpenVoiceChat

Have a natural voice conversation with an LLM

Python 39 12 Updated Jul 3, 2024

ORI-Muchim / AudioSR-Upsampling

AudioSR-Upsampling (any -> 48kHz)

Python 35 2 Updated Feb 13, 2024

cyanbx / Prompt-Singer

Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).

Python 38 6 Updated Jun 21, 2024

ditto-tts / ditto-tts.github.io

Official Demo Page for DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer

HTML 25 Updated Jul 3, 2024

Stability-AI / stable-audio-tools

Generative models for conditional audio generation

Python 2,265 199 Updated Jul 2, 2024

0nutation / USLM

Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)

Python 123 11 Updated Sep 14, 2023

line / LibriTTS-P

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

94 1 Updated Jun 13, 2024

andrewsilva9 / tune_tortoise_autoregressor

Fine tuning the UnifiedVoice autoregressor for TortoiseTTS.

Python 15 Updated Nov 25, 2023

neonbjb / DL-Art-School

DLAS - A configuration-driven trainer for generative models

Python 129 121 Updated Oct 11, 2022

ZhangXInFD / SpeechTokenizer

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 362 32 Updated Jun 9, 2024

shansongliu / M2UGen

This is the official repository for M2UGen

Jupyter Notebook 421 38 Updated May 8, 2024

BytedanceSpeech / seed-tts-eval

Python 773 74 Updated Jun 14, 2024

pkulwj1994 / diff_instruct

official code for Diff-Instruct algorithm for one-step diffusion distillation

Python 38 3 Updated Apr 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly