MedHELM 医学 LLM 评测基准
Medical LLM benchmark and leaderboard intended to broaden coverage beyond single medical QA datasets.
输入关键词或点击标签,按论文、数据资源、竞赛截止日期、征稿与课程缩小范围。 标签:可信、安全、公平与隐私 范围:数据资源
Medical LLM benchmark and leaderboard intended to broaden coverage beyond single medical QA datasets.
Benchmark for evaluating health AI model safety, helpfulness, and clinical-relevance judgments with physician-reviewed rubrics.