[go: nahoru, domu]

Skip to content
View nangongmujd's full-sized avatar
Block or Report

Block or report nangongmujd

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Multilingual Voice Understanding Model

Python 1,197 98 Updated Jul 12, 2024

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 19,159 2,441 Updated Jul 9, 2024

一个简单的本地网页界面,使用ChatTTS将文字合成为语音,同时支持对外提供API接口。A simple native web interface that uses ChatTTS to synthesize text into speech, along with support for external API interfaces.

Python 5,233 574 Updated Jul 11, 2024

Core Engine of Singing Voice Conversion & Singing Voice Clone

Python 2,556 915 Updated Apr 23, 2024

Brand new TTS solution

Python 5,246 414 Updated Jul 11, 2024

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

Python 6,891 856 Updated Jul 4, 2024
C++ 3,088 425 Updated Jul 11, 2024

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

Python 4,060 489 Updated Jul 6, 2024

A generative speech model for daily dialogue.

Python 27,599 2,999 Updated Jul 12, 2024

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting

Python 1,906 230 Updated Jul 10, 2024

MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising

Python 2,112 216 Updated Jun 28, 2024

PyTorch implementation of "StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator"

Python 186 21 Updated Aug 8, 2023

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Python 4,283 530 Updated Jul 2, 2024

Foundational model for human-like, expressive TTS

Python 3,506 617 Updated Jul 10, 2024

Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (arXiv:2401.01498)

Python 55 4 Updated Apr 4, 2024

✨✨Latest Advances on Multimodal Large Language Models

10,607 705 Updated Jul 11, 2024

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 506 38 Updated Jun 24, 2024

这个项目是数据预处理。第一步是对获取到的音频做处理,结合Funasr的时间戳去掉空背景音。也包含了喂给BERT前的label

Python 10 4 Updated Jun 25, 2024

Instant voice cloning by MyShell.

Python 27,235 2,638 Updated Jul 6, 2024

Unoffical implementation of Megatts2

Python 245 35 Updated Mar 23, 2024

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 4,238 361 Updated Jul 10, 2024

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 8,727 794 Updated Jul 1, 2024

GeneFace++: Generalized and Stable Real-Time 3D Talking Face Generation; Official Code

Python 1,331 190 Updated Jun 5, 2024

Multi-Speaker Pytorch FastSpeech2: Fast and High-Quality End-to-End Text to Speech ✊

Python 91 17 Updated Oct 14, 2022

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 29,054 3,361 Updated Jul 12, 2024

A database of head-related transfer functions created by the Center for Image Processing and Integrated Computing at the University of California in 2001.

MATLAB 33 16 Updated Sep 15, 2022

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

Python 200 35 Updated Jul 10, 2024

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 10,541 1,018 Updated Jun 26, 2024

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,247 88 Updated Jul 5, 2024

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python 4,438 346 Updated Jul 10, 2024
Next