论文ICLR 2026 Poster2026 年trustworthy medical AI MedVR:通过 Agent 强化学习实现无标注医学视觉推理
ICLR 2026 Poster accepted paper at ICLR 2026. Medical Vision-Language Models (VLMs) hold immense promise for complex clinical tasks, but their reasoning capabilities are often constrained by text-only paradigms that fail to ground inferences in visual evidence. This limitation not only curtails performance on tasks requiring fine-grained visual analysis but also introduces risks of visual hallucination in safety-critical applications. Thus, we introduce MedVR, a novel reinforcement learning framework that enables annotation-free visual reasoning for medical VLMs. Its core innovation lies in two synergistic mechanisms: Entropy-guided Visual Regrounding (EVR) uses model uncertainty to direct exploration, while Consensus-based Credit Assignment (CCA) distills pseudo-supervision from rollout agreement.
论文ICLR 2026 Poster2026 年clinical NLP 重新思考放射报告生成:从叙事流到主题引导 findings
ICLR 2026 Poster accepted paper at ICLR 2026. Vision-Language Models (VLMs) for radiology report generation are typically trained to mimic the narrative flow of human experts. However, we identify a potential limitation in this conventional paradigm. We hypothesize that optimizing for narrative coherence encourages models to rely on linguistic priors and inter-sentence correlations, which can weaken their grounding in direct visual evidence and lead to factual inaccuracies. To investigate this, we design a controlled experiment demonstrating that as textual context increases, a model's reliance on the input image systematically decays. We propose LLaVA-TA (Topic-guided and Anatomy-aware), a new fine-tuning framework that directly addresses this challenge by re-engineering the generation process.
论文ICLR 2026 Poster2026 年trustworthy medical AI Photon:用高效多模态大语言模型加速体数据理解
ICLR 2026 Poster accepted paper at ICLR 2026. Multimodal large language models are promising for clinical visual question answering tasks, but scaling to 3D imaging is hindered by high computational costs. Prior methods often rely on 2D slices or fixed-length token compression, disrupting volumetric continuity and obscuring subtle findings. We present Photon, a framework that represents 3D medical volumes with token sequences of variable length. Photon introduces instruction-conditioned token scheduling and surrogate gradient propagation to adaptively reduce tokens during both training and inference, which lowers computational cost while mitigating the attention dilution caused by redundant tokens.
数据资源Text and medical imagesModelMedGemma / MedSigLIP model family开放访问 MedGemma / MedSigLIP 医学 AI 模型
Google Health AI Developer Foundations open model resources for medical text and medical image understanding, including MedGemma 1.5 resources.