nangongmujd

nangongmujd

2 followers · 26 following

Block or Report

Block or report nangongmujd

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Stars

FunAudioLLM / SenseVoice

Multilingual Voice Understanding Model

Python 1,197 98 Updated Jul 12, 2024

microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 19,159 2,441 Updated Jul 9, 2024

jianchang512 / ChatTTS-ui

一个简单的本地网页界面，使用ChatTTS将文字合成为语音，同时支持对外提供API接口。A simple native web interface that uses ChatTTS to synthesize text into speech, along with support for external API interfaces.

Python 5,233 574 Updated Jul 11, 2024

PlayVoice / whisper-vits-svc

Core Engine of Singing Voice Conversion & Singing Voice Clone

Python 2,556 915 Updated Apr 23, 2024

fishaudio / fish-speech

Brand new TTS solution

Python 5,246 414 Updated Jul 11, 2024

fudan-generative-vision / hallo

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

Python 6,891 856 Updated Jul 4, 2024

GuijiAI / duix.ai

C++ 3,088 425 Updated Jul 11, 2024

myshell-ai / MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

Python 4,060 489 Updated Jul 6, 2024

2noise / ChatTTS

A generative speech model for daily dialogue.

Python 27,599 2,999 Updated Jul 12, 2024

TMElyralab / MuseTalk

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting

Python 1,906 230 Updated Jul 10, 2024

TMElyralab / MuseV

MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising

Python 2,112 216 Updated Jun 28, 2024

guanjz20 / StyleSync_PyTorch

PyTorch implementation of "StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator"

Python 186 21 Updated Aug 8, 2023

Zejun-Yang / AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Python 4,283 530 Updated Jul 2, 2024

metavoiceio / metavoice-src

Foundational model for human-like, expressive TTS

Python 3,506 617 Updated Jul 10, 2024

scutcsq / Neural-Transducers-for-Two-Stage-Text-to-Speech-via-Semantic-Token-Prediction

Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (arXiv:2401.01498)

Python 55 4 Updated Apr 4, 2024

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

10,607 705 Updated Jul 11, 2024

ddlBoJack / emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 506 38 Updated Jun 24, 2024

MrXnneHang / auto_labeling_for_BERT_VITS2

这个项目是数据预处理。第一步是对获取到的音频做处理，结合Funasr的时间戳去掉空背景音。也包含了喂给BERT前的label

Python 10 4 Updated Jun 25, 2024

myshell-ai / OpenVoice

Instant voice cloning by MyShell.

Python 27,235 2,638 Updated Jul 6, 2024

LSimon95 / megatts2

Unoffical implementation of Megatts2

Python 245 35 Updated Mar 23, 2024

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 4,238 361 Updated Jul 10, 2024

karpathy / minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 8,727 794 Updated Jul 1, 2024

yerfor / GeneFacePlusPlus

GeneFace++: Generalized and Stable Real-Time 3D Talking Face Generation; Official Code

Python 1,331 190 Updated Jun 5, 2024

ga642381 / FastSpeech2

Multi-Speaker Pytorch FastSpeech2: Fast and High-Quality End-to-End Text to Speech ✊

Python 91 17 Updated Oct 14, 2022

RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 29,054 3,361 Updated Jul 12, 2024

amini-allight / cipic-hrtf-database

A database of head-related transfer functions created by the Center for Image Processing and Integrated Computing at the University of California in 2001.

MATLAB 33 16 Updated Sep 15, 2022

yl4579 / PL-BERT

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

Python 200 35 Updated Jul 10, 2024

facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 10,541 1,018 Updated Jun 26, 2024

QwenLM / Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,247 88 Updated Jul 5, 2024

yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python 4,438 346 Updated Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly