daniellibin

🎯

Focusing

daniellibin

🎯

Focusing

31 followers · 23 following

USTC
Beijing，China

Achievements

Block or Report

Block or report daniellibin

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Stars

modelscope / data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据！

Python 1,795 123 Updated Jul 23, 2024

codefuse-ai / MFTCoder

High Accuracy and efficiency multi-task fine-tuning framework for Code LLMs. This work has been accepted by KDD 2024.

Python 598 64 Updated Jun 11, 2024

lonePatient / awesome-pretrained-chinese-nlp-models

Awesome Pretrained Chinese NLP Models，高质量中文预训练模型&大模型&多模态模型&大语言模型集合

Python 4,567 451 Updated Jul 17, 2024

esbatmop / MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化，也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3,248 223 Updated Jul 21, 2024

LAICcompet / laic_2022

Python 13 10 Updated Oct 14, 2022

timberding / Knowledge-driven-spoken-dialogue

Python 45 17 Updated Oct 12, 2022

Macielyoung / sentence_representation_matching

句子匹配模型，包括无监督的SimCSE、ESimCSE、PromptBERT，和有监督的SBERT、CoSENT。

Python 96 13 Updated Oct 29, 2022

stack-heap-overflow / sohu2022-nlp-rank1

2022搜狐校园算法大赛NLP赛道第一名开源方案（实验代码）

Python 78 15 Updated Jul 31, 2022

TingFree / NLPer-Arsenal

收录NLP竞赛策略实现、各任务baseline、相关竞赛经验贴（当前赛事、往期赛事、训练赛）、NLP会议时间、常用自媒体、GPU推荐等，持续更新中

Python 2,112 248 Updated Aug 29, 2023

CLUEbenchmark / CLUEDatasetSearch

搜索所有中文NLP数据集，附常用英文NLP数据集

Python 4,025 603 Updated Nov 21, 2022

gaohongkui / GlobalPointer_pytorch

全局指针统一处理嵌套与非嵌套NER的Pytorch实现

Python 364 45 Updated Mar 23, 2023

LogicJake / MLCompetitionHub

机器学习竞赛信息聚合(Machine learning competition information aggregation)

Python 130 23 Updated Sep 21, 2023

SunnyGJing / t5-pegasus-chinese

基于GOOGLE T5中文生成式模型的摘要生成/指代消解，支持batch批量生成，多进程

Python 209 34 Updated Nov 13, 2023

Coding-Zuo / DaguanFengxian

DataFountain第五届达观杯第4名方案

Python 49 15 Updated Oct 7, 2022

budzianowski / multiwoz

Source code for end-to-end dialogue model from the MultiWOZ paper (Budzianowski et al. 2018, EMNLP)

Python 842 198 Updated Jul 1, 2024

microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++ 13,662 2,777 Updated Jul 23, 2024

terryqj0107 / RiSAWOZ

Datasets and codes for the paper "RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich Semantic Annotations for Task-Oriented Dialogue Modeling". (EMNLP 2020)

58 9 Updated Aug 10, 2022

nusnlp / m2scorer

MaxMatch (M^2) Scorer - Evaluation program for grammatical error correction systems.

Python 144 36 Updated Sep 27, 2022

taishan1994 / pytorch_bert_chinese_spell_correction

基于pytorch的中文拼写纠错，使用的模型是Bert以及SoftMaskedBert

Python 30 3 Updated Oct 19, 2021

lizhe2004 / chatbot-list

行业内关于智能客服、聊天机器人的应用和架构、算法分享和介绍

1,057 223 Updated Apr 28, 2022

grammarly / gector

Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)

Python 875 214 Updated May 21, 2024