ResearchNew Evaluation Run

Upload Dataset

JSON or CSV with questions and ground truth, or choose from the benchmark library

Drop your file here or click to browse

Supports JSON, CSV · TruthfulQA, HaluEval, SimpleQA, FELM, custom formats