[go: nahoru, domu]

Skip to content
View hongwen-sun's full-sized avatar
🎯
Focusing
🎯
Focusing
Block or Report

Block or report hongwen-sun

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

AI powered speech denoising and enhancement

Python 1,136 108 Updated Jun 21, 2024

The official Meta Llama 3 GitHub site

Python 24,690 2,685 Updated Jul 26, 2024

Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.

Python 119 13 Updated Jul 25, 2024

Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch

Python 175 14 Updated Jul 27, 2024

Karras et al. (2022) diffusion models for PyTorch

Python 2,208 369 Updated Jul 16, 2024

Robust Singing Voice Transcription and MIDI Extraction

Python 33 Updated Jul 4, 2024

[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

Jupyter Notebook 521 68 Updated Jul 22, 2024

Multilingual Voice Understanding Model

Python 1,668 158 Updated Jul 27, 2024

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 2,787 262 Updated Jul 25, 2024

Bring portraits to life!

Python 8,550 797 Updated Jul 27, 2024

Brand new TTS solution

Python 6,496 507 Updated Jul 23, 2024

Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers

Python 58 3 Updated Jul 8, 2024

Train transformer language models with reinforcement learning.

Python 8,881 1,090 Updated Jul 26, 2024

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 252 17 Updated Apr 9, 2024

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 4,434 334 Updated May 28, 2024

The open source code for LLM-Codec

Python 99 2 Updated Jun 17, 2024

A SOTA lightweight multilingual LLM

Python 750 39 Updated Jul 8, 2024

Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音

Python 8,332 923 Updated Jul 25, 2024

一个简单的本地网页界面,使用ChatTTS将文字合成为语音,同时支持对外提供API接口。A simple native web interface that uses ChatTTS to synthesize text into speech, along with support for external API interfaces.

Python 5,395 593 Updated Jul 17, 2024

Bark Voice Cloning and Voice Cloning for Chinese Speech

Jupyter Notebook 2,624 373 Updated Jul 8, 2024

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Python 3,924 286 Updated Jul 26, 2024

Inference and training library for high-quality TTS models.

Python 2,905 301 Updated Jul 26, 2024

A generative speech model for daily dialogue.

Python 28,296 3,074 Updated Jul 27, 2024

All-In-One Music Structure Analyzer

Python 383 36 Updated May 9, 2024

Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3

Python 290 31 Updated Apr 17, 2024
Python 415 37 Updated Jun 7, 2024

Contains the code associated with the ICLR submission for our text-to-speech diffusion model

Python 47 1 Updated Oct 31, 2023
Rich Text Format 6,349 833 Updated Jul 26, 2024

OpenGPT 4o is a free alternative to OpenAI GPT 4o

Python 145 32 Updated Jul 26, 2024

Music generation

Python 22 5 Updated May 2, 2024
Next