/work4ai/lm-evaluation-harness

generated at 2/12/2025, 7:29:03 PM
lm-evaluation-harness
https://github.com/EleutherAI/lm-evaluation-harnesseleutherAI
https://github.com/Stability-AI/lm-evaluation-harnessmaster
>自己回帰言語モデルの少数ショット評価のためのフレームワーク
> このプロジェクトは、生成言語モデルを多数の異なる評価タスクでテストするための統一されたフレームワークを提供します。

https://github.com/Stability-AI/lm-evaluation-harness/tree/jp-stable?s=09日本版
JCommonsenseQA
JNLI
MARC-ja
JSQuAD
japanese-gpt-neox-3.6b-instruction-sftが1位

LLMベンチマーク