-
USTC
- Beijing,China
Block or Report
Block or report daniellibin
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
High Accuracy and efficiency multi-task fine-tuning framework for Code LLMs. This work has been accepted by KDD 2024.
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
句子匹配模型,包括无监督的SimCSE、ESimCSE、PromptBERT,和有监督的SBERT、CoSENT。
2022搜狐校园算法大赛NLP赛道第一名开源方案(实验代码)
收录NLP竞赛策略实现、各任务baseline、相关竞赛经验贴(当前赛事、往期赛事、训练赛)、NLP会议时间、常用自媒体、GPU推荐等,持续更新中
机器学习竞赛信息聚合(Machine learning competition information aggregation)
基于GOOGLE T5中文生成式模型的摘要生成/指代消解,支持batch批量生成,多进程
Source code for end-to-end dialogue model from the MultiWOZ paper (Budzianowski et al. 2018, EMNLP)
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Datasets and codes for the paper "RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich Semantic Annotations for Task-Oriented Dialogue Modeling". (EMNLP 2020)
MaxMatch (M^2) Scorer - Evaluation program for grammatical error correction systems.
基于pytorch的中文拼写纠错,使用的模型是Bert以及SoftMaskedBert
Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)
pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,LLaMA等模型应用在纠错场景,开箱即用。
gaiic2021-track3-小布助手对话短文本语义匹配复赛rank3、决赛rank4
ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems