站内搜索 - AI4Meder

论文ICLR 2026 Poster2026 年medical LLM agent

大语言模型能否匹配系统综述的结论？

ICLR 2026 Poster accepted paper at ICLR 2026. Systematic reviews (SR), in which experts summarize and analyze evidence across individual studies to provide insights on a specialized topic, are a cornerstone for evidence-based clinical decision-making, research, and policy. Given the exponential growth of scientific articles, there is growing interest in using large language models (LLMs) to automate SR generation. However, the ability of LLMs to critically assess evidence and reason across multiple documents to provide recommendations at the same proficiency as domain experts remains poorly characterized. We therefore ask: **Can LLMs match the conclusions of systematic reviews written by clinical experts when given access to the same studies?** To explore this question, we present MedEvidence, a benchmark pairing findings from 100 medical SRs with the studies they are based on.

医学影像计算临床语言智能论文 Benchmarks Multi-document Reasoning Medical AI 查看论文详情

搜索医学 AI 论文与资源

1 条结果

大语言模型能否匹配系统综述的结论？