数据资源详情
- 数据模态
- Text
- 资源类别
- LLM evaluation benchmark
- 数据规模
- Health AI evaluation benchmark
- 许可协议
- See OpenAI HealthBench release
- 访问方式
- 开放访问
- 适用任务
- 医学问答、safety evaluation、临床推理
- 来源
- OpenAI / arXiv
Benchmark for evaluating health AI model safety, helpfulness, and clinical-relevance judgments with physician-reviewed rubrics.