TextMedHELM 医学 LLM 评测基准开放访问Medical LLM benchmark and leaderboard intended to broaden coverage beyond single medical QA datasets.医学问答临床推理benchmarking查看数据集
TextHealthBench 健康 AI 评测基准开放访问Benchmark for evaluating health AI model safety, helpfulness, and clinical-relevance judgments with physician-reviewed rubrics.医学问答safety evaluation临床推理查看数据集