论文ICLR 2026 Poster2026 年clinical NLP

迈向医学图像分割中的文本-掩膜一致性

ICLR 2026 Poster accepted paper at ICLR 2026. Vision-language models for medical image segmentation often produce masks that conflict with the accompanying text, especially under multi-site/multi-lesion descriptions. We trace this failure to two factors: (i) highly templated and repetitive clinical language causes one-to-one hard contrastive learning to yield numerous false negatives, weakening cross-modal alignment; and (ii) predominantly vision-driven, one-way cross-attention lacks a language-dominant, spatially aware pathway, hindering effective injection of textual semantics into the spatial visual domain. To this end, we propose Consistency-enhanced Two-stage Segmentation (C2Seg). In the pretraining stage, Cluster-aware Contrastive Learning uses a frozen strong baseline to construct an intra-batch text similarity matrix as soft labels, thereby alleviating false negative conflicts and producing more discriminative visual representations.

医学影像计算医疗多模态临床语言智能论文 Medical image segmentation Vision language models Multimodal learning Kolmogorov–Arnold Networks ICLR 2026 ICLR 2026 Poster

论文详情

英文标题: Towards Text-Mask Consistency in Medical Image Segmentation
作者: Jie Gui, HangTu, Wen Sha, Xiuquan Du
期刊/会议: ICLR 2026 Poster
发表年份: 2026 年
研究方向: clinical NLP