AI4Meder 站内搜索

搜索医学 AI 论文与资源

按论文、数据资源、技术竞赛、投稿截止日期和课程资源检索社区内容，快速进入对应详情页。

3 条结果

输入关键词或点击标签，按论文、数据资源、竞赛截止日期、征稿与课程缩小范围。标签：LLM evaluation 范围：数据资源

数据资源medical exam question-answer textmedical exam QA benchmarkUSMLE, Mainland China, and Taiwan exam-style QA splits; see repository开放访问

MedQA：含美国、中国大陆与台湾拆分的医学考试问答数据集

MedQA is a medical examination question answering benchmark with English and Chinese medical licensing-style question sets, including mainland China and Taiwan variants. It is widely used for medical QA and medical reasoning evaluation.

临床语言智能数据集 medical QA Chinese exam QA USMLE LLM evaluation 查看数据资源

数据资源Chinese medical exam and QA textChinese medical LLM evaluation benchmarkMultiple Chinese medical exam and benchmark splits; see Hugging Face card开放访问

CMB：中文医学基准

CMB is a comprehensive Chinese medical benchmark for evaluating medical large language models on medical exams, reasoning, and clinical knowledge questions. It is suited for Chinese medical QA, LLM evaluation, and instruction-following assessment.

临床语言智能数据集 Chinese medical benchmark medical LLM exam QA medical_llm_agent 查看数据资源

数据资源TextLLM evaluation benchmarkHealth AI evaluation benchmark开放访问

HealthBench 健康 AI 评测基准

Benchmark for evaluating health AI model safety, helpfulness, and clinical-relevance judgments with physician-reviewed rubrics.

benchmark health AI safety LLM evaluation 查看数据资源