[ICLR 2024 & NeurIPS 2023 WS] An Evaluator LM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically designed for fine-grained evaluation on a customized score ru…

Python 279 18 Updated Nov 11, 2023

AI21Labs / factor

Code and data for the FACTOR paper

Python 36 2 Updated Nov 15, 2023

SupritYoung / Zhongjing

A Chinese medical ChatGPT based on LLaMa, training from large-scale pretrain corpus and multi-turn dialogue dataset.

Python 270 25 Updated Dec 12, 2023

open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 3,246 343 Updated Jul 11, 2024

X-PLUG / CValues

面向中文大模型价值观的评估与对齐研究

Python 437 18 Updated Jul 20, 2023

atfortes / Awesome-LLM-Reasoning

Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.

1,277 68 Updated Jun 30, 2024

FranxYao / chain-of-thought-hub

Benchmarking large language models' complex reasoning ability with chain-of-thought prompting

Jupyter Notebook 2,445 120 Updated Apr 22, 2024

RUCAIBox / LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

Python 9,585 741 Updated May 19, 2024

MLGroupJLU / LLM-eval-survey

The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".

1,324 85 Updated Jun 3, 2024

declare-lab / instruct-eval

This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.

Python 491 38 Updated Mar 10, 2024

425776024 / nlpcda

一键中文数据增强包； NLP数据增强、bert数据增强、EDA：pip install nlpcda

Python 1,718 167 Updated Apr 15, 2024

Kent0n-Li / ChatDoctor

Python 3,412 400 Updated May 17, 2024

wgwang / awesome-LLMs-In-China

中国大模型

4,849 413 Updated Jun 7, 2024

langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications

Python 88,944 14,005 Updated Jul 11, 2024

AI-secure / DecodingTrust

A Comprehensive Assessment of Trustworthiness in GPT Models

Python 229 51 Updated Jun 19, 2024

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

10,577 704 Updated Jul 11, 2024

gpt-engineer-org / gpt-engineer

Specify what you want it to build, the AI asks for clarification, and then builds it.

Python 51,376 6,683 Updated Jul 7, 2024

openai / openai-cookbook

Examples and guides for using the OpenAI API

MDX 57,582 9,082 Updated Jul 10, 2024

microsoft / CodeXGLUE

CodeXGLUE

C# 1,477 358 Updated Apr 23, 2024

THUDM / CodeGeeX

CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)

Python 7,973 573 Updated Jul 10, 2024

FlagAI-Open / FlagAI

FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.

Python 3,808 415 Updated Apr 28, 2024

WeOpenML / PandaLM

Python 865 67 Updated May 22, 2024

dongrixinyu / JioNLP

中文 NLP 预处理、解析工具包，准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com

Python 3,119 378 Updated Jul 5, 2024

lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 35,679 4,386 Updated Jul 11, 2024

DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Python 2,593 235 Updated Jun 4, 2024

fiatrete / OpenDAN-Personal-AI-OS

OpenDAN is an open source Personal AI OS , which consolidates various AI modules in one place for your personal use.

Python 1,581 128 Updated May 14, 2024

TransformerOptimus / SuperAGI

<⚡️> SuperAGI - A dev-first open source autonomous AI agent framework. Enabling developers to build, manage & run useful autonomous agents quickly and reliably.

Python 14,953 1,784 Updated Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xuejoy

Block or report xuejoy

Stars

microsoft / promptbench

ruanyf / weekly

freshllms / freshqa

prometheus-eval / prometheus