AI4Meder
返回论文列表
论文ICLR 2026 Poster2026 年clinical prediction

医学 MLLM 如何失效?医学图像视觉定位研究

ICLR 2026 Poster accepted paper at ICLR 2026. Generalist multimodal large language models (MLLMs) have achieved impressive performance across a wide range of vision-language tasks. However, their performance on medical tasks—particularly in zero-shot settings where generalization is critical—remains suboptimal. A key research gap is the limited understanding of why medical MLLMs underperform in medical image interpretation. **In this work**, we present a pioneering systematic investigation into the visual grounding capabilities of state-of-the-art medical MLLMs. To disentangle *visual grounding* from *semantic grounding*, we design VGMED, a novel evaluation dataset developed with expert clinical guidance, explicitly assessing the visual grounding capability of medical MLLMs. Code/project link: https://guimeng-leo-liu.github.io/Medical-MLLMs-Fail/

论文默认配图 - 医学影像计算

论文详情

英文标题
How Do Medical MLLMs Fail? A Study on Visual Grounding in Medical Images
作者
Guimeng Liu, Tianze Yu, Somayeh Ebrahimkhani, Lin Zhi Zheng Shawn, Kok Pin Ng, Ngai-Man Cheung
期刊/会议
ICLR 2026 Poster
发表年份
2026 年
研究方向
clinical prediction