论文ICLR 2026 Poster2026 年clinical prediction

医学 MLLM 如何失效？医学图像视觉定位研究

ICLR 2026 Poster accepted paper at ICLR 2026. Generalist multimodal large language models (MLLMs) have achieved impressive performance across a wide range of vision-language tasks. However, their performance on medical tasks—particularly in zero-shot settings where generalization is critical—remains suboptimal. A key research gap is the limited understanding of why medical MLLMs underperform in medical image interpretation. **In this work**, we present a pioneering systematic investigation into the visual grounding capabilities of state-of-the-art medical MLLMs. To disentangle *visual grounding* from *semantic grounding*, we design VGMED, a novel evaluation dataset developed with expert clinical guidance, explicitly assessing the visual grounding capability of medical MLLMs. Code/project link: https://guimeng-leo-liu.github.io/Medical-MLLMs-Fail/

医学影像计算医疗多模态临床语言智能论文 Medical MLLM Visual Grounding ICLR 2026 ICLR 2026 Poster

论文详情

英文标题: How Do Medical MLLMs Fail? A Study on Visual Grounding in Medical Images
作者: Guimeng Liu, Tianze Yu, Somayeh Ebrahimkhani, Lin Zhi Zheng Shawn, Kok Pin Ng, Ngai-Man Cheung
期刊/会议: ICLR 2026 Poster
发表年份: 2026 年
研究方向: clinical prediction