-
Notifications
You must be signed in to change notification settings - Fork 332
Issues: open-compass/opencompass
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Feature] whats the difference between mbpp and deprecated_mbpp ?
#1280
opened Jun 29, 2024 by
noforit
1 task
[Bug] qwen1.5-7B base 版本 在math测试集下得分仅有2.6分左右 远低于 官方评测给出的结果
#1274
opened Jun 25, 2024 by
1moye
2 tasks done
[Feature] Improve the Documentation for Subjective Evaluation
#1269
opened Jun 24, 2024 by
tonysy
1 of 3 tasks
[Bug] When I attempted to perform the agent evaluation, the console returned an error: "AttributeError: 'OpenAI' object has no attribute 'chat'".
#1259
opened Jun 20, 2024 by
CaptainJi
2 tasks done
[Bug] Find
scikit-learn
version conflict in requirements/runtime.txt
and requirements/extra.txt
#1256
opened Jun 19, 2024 by
BIGWangYuDong
[Bug] llama3 8b 基座模型在ARC-C PPL数据集上的评估,accuracy只有41,不正常
#1253
opened Jun 18, 2024 by
linboyang
2 tasks done
[Bug] 大佬们,这个函数好像写的有问题,只能解析出来[BEGIN]到[DONE]中间的代码,然而基座模型最先输出的代码不是以[BEGIN]开头的。
#1251
opened Jun 17, 2024 by
linboyang
2 tasks done
meta-llama/Meta-Llama-3-8B-Instruct evaluated results is not consistent with hugging face's official results
#1243
opened Jun 13, 2024 by
hzgdeerHo
2 tasks done
[Feature] Difficulty in Evaluating Custom Models with OpenCompass
#1239
opened Jun 13, 2024 by
jiangjiadi
1 task
[Bug] Passing
trust_remote_code=True
will be mandatory to load this dataset from the next major release of datasets
.
#1233
opened Jun 9, 2024 by
chairmanQi
2 tasks done
[Bug] When testing on gen datasets, even if the output is empty or incorrect, unexpected scores can be obtained
#1232
opened Jun 7, 2024 by
chairmanQi
2 tasks done
大海捞针数据集初始化报错( Failed to get opencompass.datasets.needlebench.origin.NeedleBenchOriginDataset)
#1229
opened Jun 6, 2024 by
macheng6
2 tasks done
[Bug] run pytorch Qwen-7B-Chat with ARC-c ppl under CPU ,and result is not good
#1226
opened Jun 5, 2024 by
FlexLaughing
2 tasks done
[Bug] which version of the dataset should be selected When evaluating the Llama3 model,
#1223
opened Jun 3, 2024 by
bullw
2 tasks done
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.