论文详情
- 英文标题
- How Do Medical MLLMs Fail? A Study on Visual Grounding in Medical Images
- 作者
- Guimeng Liu, Tianze Yu, Somayeh Ebrahimkhani, Lin Zhi Zheng Shawn, Kok Pin Ng, Ngai-Man Cheung
- 期刊/会议
- ICLR 2026 Poster
- 发表年份
- 2026 年
- 研究方向
- clinical prediction
ICLR 2026 Poster accepted paper at ICLR 2026. Generalist multimodal large language models (MLLMs) have achieved impressive performance across a wide range of vision-language tasks. However, their performance on medical tasks—particularly in zero-shot settings where generalization is critical—remains suboptimal. A key research gap is the limited understanding of why medical MLLMs underperform in medical image interpretation. **In this work**, we present a pioneering systematic investigation into the visual grounding capabilities of state-of-the-art medical MLLMs. To disentangle *visual grounding* from *semantic grounding*, we design VGMED, a novel evaluation dataset developed with expert clinical guidance, explicitly assessing the visual grounding capability of medical MLLMs. Code/project link: https://guimeng-leo-liu.github.io/Medical-MLLMs-Fail/
