论文ICLR 2026 Poster2026 年trustworthy medical AI LiveClin:无泄漏的实时临床基准
ICLR 2026 Poster accepted paper at ICLR 2026. The reliability of medical LLM evaluation is critically undermined by data contamination and knowledge obsolescence, leading to inflated scores on static benchmarks. To address these challenges, we introduce LiveClin, a live benchmark designed for the approximating real-world clinical practice. Built from contemporary, peer-reviewed case reports and updated biannually, LiveClin ensures clinical currency and resists data contamination. Using a verified AI–human workflow involving 239 physicians, we transform authentic patient cases into complex, multimodal evaluation scenarios that span the entire clinical pathway. Code/project link: https://github.com/AQ-MedAI/LiveClin
数据资源medical exam question-answer textmedical exam QA benchmarkUSMLE, Mainland China, and Taiwan exam-style QA splits; see repository开放访问 MedQA:含美国、中国大陆与台湾拆分的医学考试问答数据集
MedQA is a medical examination question answering benchmark with English and Chinese medical licensing-style question sets, including mainland China and Taiwan variants. It is widely used for medical QA and medical reasoning evaluation.
数据资源Chinese medical exam and QA textChinese medical LLM evaluation benchmarkMultiple Chinese medical exam and benchmark splits; see Hugging Face card开放访问 CMB:中文医学基准
CMB is a comprehensive Chinese medical benchmark for evaluating medical large language models on medical exams, reasoning, and clinical knowledge questions. It is suited for Chinese medical QA, LLM evaluation, and instruction-following assessment.
征稿与合作EMNLP 2026截止 北京时间 2026-05-25会议征稿 EMNLP 2026 征稿
CCF-Deadlines lists EMNLP 2026 with papers due 2026-05-25 UTC-12 and conference dates 2026-10-24 to 2026-10-29 in Budapest. EMNLP is relevant to clinical NLP, biomedical language models, medical text mining, EHR note understanding, and safe medical LLM evaluation.