论文ICLR 2026 Poster2026 年medical LLM agent

大语言模型能否匹配系统综述的结论？

ICLR 2026 Poster accepted paper at ICLR 2026. Systematic reviews (SR), in which experts summarize and analyze evidence across individual studies to provide insights on a specialized topic, are a cornerstone for evidence-based clinical decision-making, research, and policy. Given the exponential growth of scientific articles, there is growing interest in using large language models (LLMs) to automate SR generation. However, the ability of LLMs to critically assess evidence and reason across multiple documents to provide recommendations at the same proficiency as domain experts remains poorly characterized. We therefore ask: **Can LLMs match the conclusions of systematic reviews written by clinical experts when given access to the same studies?** To explore this question, we present MedEvidence, a benchmark pairing findings from 100 medical SRs with the studies they are based on.

医学影像计算临床语言智能论文 Benchmarks Multi-document Reasoning Medical AI ICLR 2026 ICLR 2026 Poster medical_llm_agent clinical_translation

论文详情

英文标题: Can Large Language Models Match the Conclusions of Systematic Reviews?
作者: Christopher Polzak, Alejandro Lozano, Min Woo Sun, James Burgess, Yuhui Zhang, Kevin Wu, Chia-Chun Chiang, Jeffrey J Nirschl, Serena Yeung-Levy
期刊/会议: ICLR 2026 Poster
发表年份: 2026 年
研究方向: medical LLM agent