Block or Report
Block or report nangongmujd
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently active
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
so-vits-svc fork with realtime support, improved interface and more features.
Easily train a good VC model with voice data <= 10 mins!
Production First and Production Ready End-to-End Speech Recognition Toolkit
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
Tools for handling speech data in machine learning projects.
A generative speech model for daily dialogue.
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
✨✨Latest Advances on Multimodal Large Language Models
Multilingual Voice Understanding Model
Foundational Models for State-of-the-Art Speech and Text Translation
Robust Speech Recognition via Large-Scale Weak Supervision
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Foundational model for human-like, expressive TTS
vits2 backbone with multilingual-bert
A multi-voice TTS system trained with an emphasis on quality
as-ideas / ForwardTacotron
Forked from fatchord/WaveRNN⏩ Generating speech in a single forward pass without any attention!
一个简单的本地网页界面,使用ChatTTS将文字合成为语音,同时支持对外提供API接口。A simple native web interface that uses ChatTTS to synthesize text into speech, along with support for external API interfaces.
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,LLaMA等模型应用在纠错场景,开箱即用。
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.