[go: nahoru, domu]

Skip to content
View daniellibin's full-sized avatar
🎯
Focusing
🎯
Focusing
  • USTC
  • Beijing,China
Block or Report

Block or report daniellibin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!

Python 1,795 123 Updated Jul 23, 2024

High Accuracy and efficiency multi-task fine-tuning framework for Code LLMs. This work has been accepted by KDD 2024.

Python 598 64 Updated Jun 11, 2024

Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合

Python 4,567 451 Updated Jul 17, 2024

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3,248 223 Updated Jul 21, 2024
Python 13 10 Updated Oct 14, 2022

句子匹配模型,包括无监督的SimCSE、ESimCSE、PromptBERT,和有监督的SBERT、CoSENT。

Python 96 13 Updated Oct 29, 2022

2022搜狐校园算法大赛NLP赛道第一名开源方案(实验代码)

Python 78 15 Updated Jul 31, 2022

收录NLP竞赛策略实现、各任务baseline、相关竞赛经验贴(当前赛事、往期赛事、训练赛)、NLP会议时间、常用自媒体、GPU推荐等,持续更新中

Python 2,112 248 Updated Aug 29, 2023

搜索所有中文NLP数据集,附常用英文NLP数据集

Python 4,025 603 Updated Nov 21, 2022

全局指针统一处理嵌套与非嵌套NER的Pytorch实现

Python 364 45 Updated Mar 23, 2023

机器学习竞赛信息聚合(Machine learning competition information aggregation)

Python 130 23 Updated Sep 21, 2023

基于GOOGLE T5中文生成式模型的摘要生成/指代消解,支持batch批量生成,多进程

Python 209 34 Updated Nov 13, 2023

DataFountain第五届达观杯第4名方案

Python 49 15 Updated Oct 7, 2022

Source code for end-to-end dialogue model from the MultiWOZ paper (Budzianowski et al. 2018, EMNLP)

Python 842 198 Updated Jul 1, 2024

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++ 13,662 2,777 Updated Jul 23, 2024

Datasets and codes for the paper "RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich Semantic Annotations for Task-Oriented Dialogue Modeling". (EMNLP 2020)

58 9 Updated Aug 10, 2022

MaxMatch (M^2) Scorer - Evaluation program for grammatical error correction systems.

Python 144 36 Updated Sep 27, 2022

基于pytorch的中文拼写纠错,使用的模型是Bert以及SoftMaskedBert

Python 30 3 Updated Oct 19, 2021

行业内关于智能客服、聊天机器人的应用和架构、算法分享和介绍

1,057 223 Updated Apr 28, 2022

Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)

Python 875 214 Updated May 21, 2024

Paper list for grammatical error correction (GEC).

35 3 Updated May 23, 2024

pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,LLaMA等模型应用在纠错场景,开箱即用。

Python 5,367 1,080 Updated Jul 19, 2024

gaiic2021-track3-小布助手对话短文本语义匹配复赛rank3、决赛rank4

Python 144 42 Updated Jun 19, 2021

2021搜狐校园文本匹配算法大赛Top2方案

Python 39 8 Updated Mar 25, 2024

ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems

Python 446 129 Updated Jun 17, 2024

📚 技术面试必备基础知识、Leetcode、计算机操作系统、计算机网络、系统设计

173,866 50,781 Updated Jul 5, 2024

该仓库主要记录 NLP 算法工程师相关的面试题

2,377 510 Updated Oct 10, 2023

基于知识图谱的《红楼梦》人物关系可视化及问答系统

HTML 1,103 296 Updated Apr 23, 2019

全球人工智能技术创新大赛-赛道三-冠军方案

Python 235 59 Updated Jul 12, 2021
Next