AI4Meder 站内搜索

搜索医学 AI 论文与资源

按论文、数据资源、技术竞赛、投稿截止日期和课程资源检索社区内容，快速进入对应详情页。

57 条结果

输入关键词或点击标签，按论文、数据资源、竞赛截止日期、征稿与课程缩小范围。标签：医疗多模态

论文ICLR 2026 Poster2026 年trustworthy medical AI

可解释性与嵌入的桥接：让 BEE 识别伪相关

ICLR 2026 Poster accepted paper at ICLR 2026. Current methods for detecting spurious correlations rely on data splits or error patterns, leaving many harmful shortcuts invisible when counterexamples are absent. We introduce BEE (Bridging Explainability and Embeddings), a framework that shifts the focus from model predictions to the weight space and embedding geometry underlying decisions. By analyzing how fine-tuning perturbs pretrained representations, BEE uncovers spurious correlations that remain hidden from conventional evaluation pipelines. We use linear probing as a transparent diagnostic lens, revealing spurious features that not only persist after full fine-tuning but also transfer across diverse state-of-the-art models. Code/project link: https://github.com/bit-ml/bee

医疗多模态临床语言智能可信、安全、公平与隐私论文 spurious correlation interpretability 查看论文详情

论文ICLR 2026 Poster2026 年clinical NLP

VLM-SubtleBench：VLM 距离人类级细微比较推理还有多远？

ICLR 2026 Poster accepted paper at ICLR 2026. The ability to distinguish subtle differences between visually similar images is essential for diverse domains such as industrial anomaly detection, medical imaging, and aerial surveillance. While comparative reasoning benchmarks for vision-language models (VLMs) have recently emerged, they primarily focus on images with large, salient differences and fail to capture the nuanced reasoning required for real-world applications. In this work, we introduce **VLM-SubtleBench**, a benchmark designed to evaluate VLMs on *subtle comparative reasoning*. Our benchmark covers ten difference types—Attribute, State, Emotion, Temporal, Spatial, Existence, Quantity, Quality, Viewpoint, and Action—and curate paired question–image sets reflecting these fine-grained variations.

医学影像计算医疗多模态临床语言智能论文 Vision-language Models Multimodal Large Language Models 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

Dyslexify：CLIP 中抵御排版攻击的机制性防御

ICLR 2026 Poster accepted paper at ICLR 2026. Typographic attacks exploit multi-modal systems by injecting text into images, leading to targeted misclassifications, malicious content generation and even Vision-Language Model jailbreaks. In this work, we analyze how CLIP vision encoders behave under typographic attacks, locating specialized attention heads in the latter half of the model's layers that causally extract and transmit typographic information to the cls token. Building on these insights, we introduce Dyslexify - a method to defend CLIP models against typographic attacks by selectively ablating a typographic circuit, consisting of attention heads. Without requiring finetuning, dyslexify improves performance by up to 22.06\% on a typographic variant of ImageNet-100, while reducing standard ImageNet-100 accuracy by less than 1\%, and demonstrate its utility in a medical foundation model for skin lesion diagnosis.

医学影像计算医疗多模态临床语言智能论文 Multimodality Circuit analysis 查看论文详情

论文ICLR 2026 Poster2026 年clinical NLP

LaVCa：LLM 辅助的视觉皮层图像描述

ICLR 2026 Poster accepted paper at ICLR 2026. Understanding the properties of neural populations (or voxels) in the human brain can advance our comprehension of human perceptual and cognitive processing capabilities and contribute to developing brain-inspired computer models. Recent encoding models using deep neural networks (DNNs) have successfully predicted voxel-wise activity. However, interpreting the properties that explain voxel responses remains challenging because of the black-box nature of DNNs. As a solution, we propose LLM-assisted Visual Cortex Captioning (LaVCa), a data-driven approach that leverages large language models (LLMs) to generate natural-language captions for images to which voxels are selective.

医学影像计算医疗多模态临床语言智能论文 Neuroscience Computer vision 查看论文详情

论文ICLR 2026 Oral2026 年medical multimodal

面向多模态 GigaVoxel 图像配准的可扩展分布式框架

ICLR 2026 Oral accepted paper at ICLR 2026. In this work, we propose FFDP, a set of IO-aware non-GEMM fused kernels supplemented with a distributed framework for image registration at unprecedented scales. Image registration is an inverse problem fundamental to biomedical and life sciences, but algorithms have not scaled in tandem with image acquisition capabilities. Our framework complements existing model parallelism techniques proposed for large-scale transformer training by optimizing non-GEMM bottlenecks and enabling convolution-aware tensor sharding. We demonstrate unprecedented capabilities by performing multimodal registration of a 100μm ex-vivo human brain MRI volume at native resolution – an inverse problem more than 570× larger than a standard clinical datum in about a minute using only 8 A6000 GPUs.

医学影像计算医疗多模态论文 image registration distributed optimization CUDA kernels 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

融合像素与基因：计算病理中的空间感知学习

ICLR 2026 Poster accepted paper at ICLR 2026. Recent years have witnessed remarkable progress in multimodal learning within computational pathology. Existing models primarily rely on vision and language modalities; however, language alone lacks molecular specificity and offers limited pathological supervision, leading to representational bottlenecks. In this paper, we propose STAMP, a Spatial Transcriptomics-Augmented Multimodal Pathology representation learning framework that integrates spatially-resolved gene expression profiles to enable molecule-guided joint embedding of pathology images and transcriptomic data. Our study shows that self-supervised, gene-guided training provides a robust and task-agnostic signal for learning pathology image representations. Code/project link: https://github.com/Hanminghao/STAMP

医学影像计算医疗多模态可信、安全、公平与隐私论文 Computational pathology Multimodal Learning 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

序贯信息瓶颈融合：迈向鲁棒且可泛化的多模态脑肿瘤分割

ICLR 2026 Poster accepted paper at ICLR 2026. Brain tumor segmentation in multi-modal MRIs poses significant challenges when one or more modalities are missing. Recent approaches commonly employ parallel fusion strategies; however, these methods often risk losing crucial shared information across modalities, which can degrade segmentation performance. In this paper, we advocate leveraging sequential information bottleneck fusion to effectively preserve shared information across modalities. From an information-theoretic perspective, sequential fusion not only produces more robust fused representations in missing-data scenarios but also achieves a tighter generalization upper bound compared to parallel fusion approaches.

医学影像计算医疗多模态可信、安全、公平与隐私论文 Brain Tumor Segmentation Missing Modality 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

Johnson-Lindenstrauss 引理引导的高效 3D 医学分割网络

ICLR 2026 Poster accepted paper at ICLR 2026. Lightweight 3D medical image segmentation remains constrained by a fundamental "efficiency / robustness conflict", particularly when processing complex anatomical structures and heterogeneous modalities. In this paper, we study how to redesign the framework based on the characteristics of high-dimensional 3D images, and explore data synergy to overcome the fragile representation of lightweight methods. Our approach, VeloxSeg, begins with a deployable and extensible dual-stream CNN-Transformer architecture composed of Paired Window Attention (PWA) and Johnson-Lindenstrauss lemma-guided convolution (JLC). For each 3D image, we invoke a "glance-and-focus" principle, where PWA rapidly retrieves multi-scale information, and JLC ensures robust local feature extraction with minimal parameters, significantly enhancing the model's ability to operate with low computational budget. Code/project link: https://github.com/JinPLu/VeloxSeg

医学影像计算医疗多模态 EHR 与临床预测论文 Efficient Medical segmentation multimodal learning 查看论文详情

论文ICLR 2026 Poster2026 年clinical prediction

医学 MLLM 如何失效？医学图像视觉定位研究

ICLR 2026 Poster accepted paper at ICLR 2026. Generalist multimodal large language models (MLLMs) have achieved impressive performance across a wide range of vision-language tasks. However, their performance on medical tasks—particularly in zero-shot settings where generalization is critical—remains suboptimal. A key research gap is the limited understanding of why medical MLLMs underperform in medical image interpretation. **In this work**, we present a pioneering systematic investigation into the visual grounding capabilities of state-of-the-art medical MLLMs. To disentangle *visual grounding* from *semantic grounding*, we design VGMED, a novel evaluation dataset developed with expert clinical guidance, explicitly assessing the visual grounding capability of medical MLLMs. Code/project link: https://guimeng-leo-liu.github.io/Medical-MLLMs-Fail/

医学影像计算医疗多模态临床语言智能论文 Medical MLLM Visual Grounding 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

sleep2vec：异质夜间生理信号的统一跨模态对齐

ICLR 2026 Poster accepted paper at ICLR 2026. Tasks ranging from sleep staging to clinical diagnosis traditionally rely on standard polysomnography (PSG) devices, bedside monitors and wearable devices, which capture diverse nocturnal biosignals (e.g., EEG, EOG, ECG, SpO$_2$). However, heterogeneity across devices and frequent sensor dropout pose significant challenges for unified modelling of these multimodal signals. We present sleep2vec, a foundation model for diverse and incomplete nocturnal biosignals that learns a shared representation via cross-modal alignment. sleep2vec is contrastively pre-trained on 42,249 overnight recordings spanning nine modalities using a Demography, Age, Site & History-aware InfoNCE objective that incorporates physiological and acquisition metadata (e.g., age, gender, recording site) to dynamically weight negatives and mitigate cohort-specific shortcuts.

医学影像计算医疗多模态 EHR 与临床预测论文 Contrastive Learning Physiological Signal 查看论文详情

论文ICLR 2026 Poster2026 年clinical NLP

迈向医学图像分割中的文本-掩膜一致性

ICLR 2026 Poster accepted paper at ICLR 2026. Vision-language models for medical image segmentation often produce masks that conflict with the accompanying text, especially under multi-site/multi-lesion descriptions. We trace this failure to two factors: (i) highly templated and repetitive clinical language causes one-to-one hard contrastive learning to yield numerous false negatives, weakening cross-modal alignment; and (ii) predominantly vision-driven, one-way cross-attention lacks a language-dominant, spatially aware pathway, hindering effective injection of textual semantics into the spatial visual domain. To this end, we propose Consistency-enhanced Two-stage Segmentation (C2Seg). In the pretraining stage, Cluster-aware Contrastive Learning uses a frozen strong baseline to construct an intra-batch text similarity matrix as soft labels, thereby alleviating false negative conflicts and producing more discriminative visual representations.

医学影像计算医疗多模态临床语言智能论文 Medical image segmentation Vision language models 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

单模态基础模型的联合适配用于多模态阿尔茨海默病诊断

ICLR 2026 Poster accepted paper at ICLR 2026. Alzheimer’s Disease (AD) is a progressive neurodegenerative disorder and a leading cause of dementia worldwide. Accurate diagnosis requires integrating diverse patient data modalities. With the rapid advancement of foundation models in neurobiology and medicine, integrating foundation models from various modalities has emerged as a promising yet underexplored direction for multi-modal AD diagnosis. A central challenge is enabling effective interaction among these models without disrupting the robust, modality-specific representations learned from large-scale pretraining. To address this, we propose a novel multi-modal framework for AD diagnosis that enables joint interaction among uni-modal foundation models through modality-anchored interaction.

医学影像计算医疗多模态可信、安全、公平与隐私论文 Artificial Intelligence for sciences Alzheimer's disease 查看论文详情

论文ICLR 2026 Poster2026 年clinical NLP

通过多粒度语言学习增强医学视觉理解

ICLR 2026 Poster accepted paper at ICLR 2026. Recent advances in image-text pretraining have significantly enhanced visual understanding by aligning visual and textual representations. Contrastive Language-Image Pretraining (CLIP) has played a pivotal role in multimodal learning. However, its focus on single-label, single-granularity alignment limits its effectiveness in complex domains such as medical imaging, where images often correspond to multiple labels across different levels of granularity. To address this, we propose Multi-Granular Language Learning (MGLL), a contrastive learning framework designed to improve both multi-label and cross-granularity alignment. Code/project link: https://github.com/HUANGLIZI/MGLL

医学影像计算医疗多模态临床语言智能论文 Multi-Granular Language Learning Medical Image Analysis 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

面向多模态癌症生存分析的结构化预后事件建模

ICLR 2026 Poster accepted paper at ICLR 2026. The integration of histology images and gene profiles has shown great promise for improving survival prediction in cancer. However, current approaches often struggle to model intra- and inter-modal interactions efficiently and effectively due to the high dimensionality and complexity of the inputs. A major challenge is capturing critical prognostic events that, though few, underlie the complexity of the observed inputs and largely determine patient outcomes. These events---manifested as high-level structural signals such as spatial histologic patterns or pathway co-activations---are typically sparse, patient-specific, and unannotated, making them inherently difficult to uncover.

医学影像计算医疗多模态 EHR 与临床预测论文 Computational Pathology Multimodal Learning 查看论文详情

论文ICLR 2026 Poster2026 年clinical prediction

CerebraGloss：面向细粒度临床 EEG 解读的大型视觉语言模型指令微调

ICLR 2026 Poster accepted paper at ICLR 2026. Interpreting clinical electroencephalography (EEG) is a laborious, subjective process, and existing computational models are limited to narrow classification tasks rather than holistic interpretation. A key bottleneck for applying powerful Large Vision-Language Models (LVLMs) to this domain is the scarcity of datasets pairing EEG visualizations with fine-grained, expert-level annotations. We address this by introducing CerebraGloss, an instruction-tuned LVLM for nuanced EEG interpretation. We first introduce a novel, automated data generation pipeline, featuring a bespoke YOLO-based waveform detector, to programmatically create a large-scale corpus of EEG-text instruction data. Code/project link: https://github.com/iewug/CerebraGloss

医学影像计算医疗多模态临床语言智能论文 large vision-language model instruction-tuning 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

Resp-Agent：面向多模态呼吸音生成与疾病诊断的 Agent 系统

ICLR 2026 Poster accepted paper at ICLR 2026. Deep learning-based respiratory auscultation is currently hindered by two fundamental challenges: (i) inherent information loss, as converting signals into spectrograms discards transient acoustic events and clinical context; (ii) limited data availability, exacerbated by severe class imbalance. To bridge these gaps, we present **_Resp-Agent_**, an autonomous multimodal system orchestrated by a novel Active Adversarial Curriculum Agent (Thinker-A²CA). Unlike static pipelines, Thinker-A²CA serves as a central controller that actively identifies diagnostic weaknesses and schedules targeted synthesis in a closed loop. To address the representation gap, we introduce a modality-weaving Diagnoser that weaves clinical text with audio tokens via strategic global attention and sparse audio anchors, capturing both long-range clinical context and millisecond-level transients. Code/project link: https://github.com/zpforlove/Resp-Agent

医学影像计算医疗多模态临床语言智能论文 Respiratory sounds Multimodal learning 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

MedVR：通过 Agent 强化学习实现无标注医学视觉推理

ICLR 2026 Poster accepted paper at ICLR 2026. Medical Vision-Language Models (VLMs) hold immense promise for complex clinical tasks, but their reasoning capabilities are often constrained by text-only paradigms that fail to ground inferences in visual evidence. This limitation not only curtails performance on tasks requiring fine-grained visual analysis but also introduces risks of visual hallucination in safety-critical applications. Thus, we introduce MedVR, a novel reinforcement learning framework that enables annotation-free visual reasoning for medical VLMs. Its core innovation lies in two synergistic mechanisms: Entropy-guided Visual Regrounding (EVR) uses model uncertainty to direct exploration, while Consensus-based Credit Assignment (CCA) distills pseudo-supervision from rollout agreement.

医学影像计算医疗多模态临床语言智能论文 Think with images Medical visual reasoning 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

LiveClin：无泄漏的实时临床基准

ICLR 2026 Poster accepted paper at ICLR 2026. The reliability of medical LLM evaluation is critically undermined by data contamination and knowledge obsolescence, leading to inflated scores on static benchmarks. To address these challenges, we introduce LiveClin, a live benchmark designed for the approximating real-world clinical practice. Built from contemporary, peer-reviewed case reports and updated biannually, LiveClin ensures clinical currency and resists data contamination. Using a verified AI–human workflow involving 239 physicians, we transform authentic patient cases into complex, multimodal evaluation scenarios that span the entire clinical pathway. Code/project link: https://github.com/AQ-MedAI/LiveClin

医学影像计算医疗多模态临床语言智能论文 MultiModal Medical Benchmark ICLR 2026 查看论文详情

论文ICLR 2026 Poster2026 年clinical NLP

用于胸部 X 光图像的结构化、标注式、定位化 VQA 数据集：含完整句答案与场景图

ICLR 2026 Poster accepted paper at ICLR 2026. Visual Question Answering (VQA) enables targeted and context-dependent analysis of medical images, such as chest X-rays (CXRs). However, existing VQA datasets for CXRs are typically constrained by simplistic and brief answer formats, lacking localization annotations (e.g., bounding boxes) and structured tags (e.g., region or radiological finding/disease tags). To address these limitations, we introduce MIMIC-Ext-CXR-QBA (abbr. CXR-QBA), a large-scale CXR VQA dataset derived from MIMIC-CXR, comprising 42 million QA-pairs with multi-granular, multi-part answers, detailed bounding boxes, and structured tags. Code/project link: https://github.com/philip-mueller/mimic-ext-cxr-qba/

医学影像计算医疗多模态临床语言智能论文 VQA Localization 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

用生成器-验证器 LMM 从医学文档合成高质量视觉问答

ICLR 2026 Poster accepted paper at ICLR 2026. Large Multimodal Models (LMMs) are increasingly capable of answering medical questions that require joint reasoning over images and text, yet training general medical VQA systems is impeded by the lack of large, openly usable, high-quality corpora. We present MedVLSynther, a rubric-guided generator-verifier framework that synthesizes high-quality multiple-choice VQA items directly from open biomedical literature by conditioning on figures, captions, and in-text references. The generator produces self-contained stems and parallel, mutually exclusive options under a machine-checkable JSON schema; a multi-stage verifier enforces essential gates (self-containment, single correct answer, clinical validity, image-text consistency), awards fine-grained positive points, and penalizes common failure modes before acceptance. Applying this pipeline to PubMed Central yields MedSynVQA: 13,087 audited questions over 14,803 images spanning 13 imaging modalities and 28 anatomical regions.

医学影像计算医疗多模态临床语言智能论文 Medical VQA Large Multimodal Models 查看论文详情

论文ICLR 2026 Poster2026 年clinical NLP

重新思考放射报告生成：从叙事流到主题引导 findings

ICLR 2026 Poster accepted paper at ICLR 2026. Vision-Language Models (VLMs) for radiology report generation are typically trained to mimic the narrative flow of human experts. However, we identify a potential limitation in this conventional paradigm. We hypothesize that optimizing for narrative coherence encourages models to rely on linguistic priors and inter-sentence correlations, which can weaken their grounding in direct visual evidence and lead to factual inaccuracies. To investigate this, we design a controlled experiment demonstrating that as textual context increases, a model's reliance on the input image systematically decays. We propose LLaVA-TA (Topic-guided and Anatomy-aware), a new fine-tuning framework that directly addresses this challenge by re-engineering the generation process.

医学影像计算医疗多模态临床语言智能论文 Radiology report generation large-language models 查看论文详情

论文ICLR 2026 Poster2026 年clinical prediction

M3CoTBench：医学图像理解中 MLLM 思维链基准

ICLR 2026 Poster accepted paper at ICLR 2026. Chain-of-Thought (CoT) reasoning has proven effective in enhancing large language models by encouraging step-by-step intermediate reasoning, and recent advances have extended this paradigm to Multimodal Large Language Models (MLLMs). In the medical domain, where diagnostic decisions depend on nuanced visual cues and sequential reasoning, CoT aligns naturally with clinical thinking processes. However, current benchmarks for medical image understanding generally focus on the final answer while ignoring the reasoning path. An opaque process lacks reliable bases for judgment, making it difficult to assist doctors in diagnosis.

医学影像计算医疗多模态临床语言智能论文 Chain-of-Thought Multimodal Large Language Models 查看论文详情

论文ICLR 2026 Poster2026 年clinical prediction

面向数据高效精准肿瘤学的病理组学多模态结构表征学习

ICLR 2026 Poster accepted paper at ICLR 2026. Fusing histopathology images and genomics data with deep learning has significantly advanced precision oncology. However, genomics data is often missing due to its high acquisition cost and complexity in real-world clinical scenarios. Existing solutions aim to reconstruct genomics data from histopathology images. Nevertheless, these methods typically relied only on individual case and overlooked the potential relationships among cases. Additionally, they failed to take advantage of the authentic genomics data of diagnostically related cases that are accessible from training for inference. In this work, we propose a novel Multi-modal Structural Representation Learning (MSRL) framework for data-efficient precision oncology. Code/project link: https://github.com/WkEEn/MSRL

医学影像计算医疗多模态 EHR 与临床预测论文 multi-modal learning histopathology image representation learning 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

CARE：面向多模态医学推理临床问责的证据扎根 Agent 框架

ICLR 2026 Poster accepted paper at ICLR 2026. Large visual language models (VLMs) have shown strong multi-modal medical reasoning ability, but most operate as end-to-end black boxes, diverging from clinicians’ evidence-based, staged workflows and hindering clinical accountability. Complementarily, expert visual grounding models can accurately localize regions of interest (ROIs), providing explicit, reliable evidence that improves both reasoning accuracy and trust. In this paper, we introduce **CARE**, advancing **C**linical **A**ccountability in multi-modal medical **R**easoning with an **E**vidence-grounded agentic framework. Unlike existing approaches that couple grounding and reasoning within a single generalist model, CARE decomposes the task into coordinated sub-modules to reduce shortcut learning and hallucination: a compact VLM proposes relevant medical entities; an expert entity-referring segmentation model produces pixel-level ROI evidence; and a grounded VLM reasons over the full image augmented by ROI hints.

医学影像计算医疗多模态临床语言智能论文 Multi-modal Large Language Agent Medical Visual Question Answering 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

Photon：用高效多模态大语言模型加速体数据理解

ICLR 2026 Poster accepted paper at ICLR 2026. Multimodal large language models are promising for clinical visual question answering tasks, but scaling to 3D imaging is hindered by high computational costs. Prior methods often rely on 2D slices or fixed-length token compression, disrupting volumetric continuity and obscuring subtle findings. We present Photon, a framework that represents 3D medical volumes with token sequences of variable length. Photon introduces instruction-conditioned token scheduling and surrogate gradient propagation to adaptively reduce tokens during both training and inference, which lowers computational cost while mitigating the attention dilution caused by redundant tokens.

医学影像计算医疗多模态临床语言智能论文 3D Medical Image Analysis Medical VQA 查看论文详情

论文ICLR 2026 Poster2026 年clinical prediction

FETAL-GAUGE：评估胎儿超声视觉语言模型的基准

ICLR 2026 Poster accepted paper at ICLR 2026. The growing demand for prenatal ultrasound imaging has intensified a global shortage of trained sonographers, creating barriers to essential fetal health monitoring. Deep learning has the potential to enhance sonographers' efficiency and support the training of new practitioners. Vision-Language Models (VLMs) are particularly promising for ultrasound interpretation, as they can jointly process images and text to perform multiple clinical tasks within a single framework. However, despite the expansion of VLMs, no standardized benchmark exists to evaluate their performance in fetal ultrasound imaging. Code/project link: https://github.com/BioMedIA-MBZUAI/FETAL-GAUGE

医学影像计算医疗多模态临床语言智能论文 Vision-Language Models Fetal Ultrasound 查看论文详情

论文ICLR 2026 Poster2026 年clinical NLP

多图像医学思维

ICLR 2026 Poster accepted paper at ICLR 2026. Large language models perform well on many medical QA benchmarks, but real clinical reasoning is harder because diagnosis often requires integrating evidence across multiple images rather than interpreting a single view. We introduce MedThinkVQA, an expert-annotated benchmark for thinking with multiple images, in which models must interpret each image, combine cross-view evidence, and solve diagnostic questions under intermediate supervision and step-level evaluation. The dataset contains 10,067 cases, including 720 test cases, with an average of 6.68 images per case, substantially denser than prior work (earlier maxima $\leq$ 1.43). On the test set, the best closed-source models, Claude-4.6-opus, Gemini-3-pro, and GPT-5.2-xhigh, achieve only 54.9%--57.2% accuracy, while smaller proprietary variants, GPT-5-mini/nano, drop to 39.7% and 30.8%.

医学影像计算医疗多模态临床语言智能论文 Multimodal diagnostic reasoning Vision language models (VLMs)查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

AttTok：将属性 token 与生成式预训练视觉语言模型结合用于医学图像理解

ICLR 2026 Poster accepted paper at ICLR 2026. Recent generative pre-trained vision–language (GPTv) models have achieved remarkable success in multi-modal understanding, inspiring their adaptation to medical imaging tasks such as disease diagnosis and visual question answering (VQA). However, current instruction-tuned GPTv models suffer from two key challenges: (1) medical attributes (e.g., disease names, severity grades) are encoded as plain text tokens, collapsing semantically distinct concepts into nearly identical textual sequences; and (2) inadequate textual supervision weakens visual representation learning, leading to severe inter-attribute confusion and misaligned vision–language embeddings. To address these limitations, we introduce attribute tokens (AttTok), a set of pre‑defined special tokens that uniquely encode clinical attributes (e.g., imaging modality, diagnosis, severity) within a structured token space. Complemented by attribute‑centric embedding books, AttTok serves as anchor points for aligning both visual and textual modalities into a shared, discriminative representation space.

医学影像计算医疗多模态临床语言智能论文 Medical generative pre-trained models medical Multi-Modal alignment 查看论文详情

论文ICLR 2026 Poster2026 年clinical prediction

能否用 LLM 为临床时间序列数据生成可迁移表征？

ICLR 2026 Poster accepted paper at ICLR 2026. Recent advances in vision-language models (VLMs) have achieved remarkable performance on standard medical benchmarks, yet their true clinical reasoning ability remains unclear. Existing datasets predominantly emphasize classification accuracy, creating an evaluation illusion in which models appear proficient while still failing at high-stakes diagnostic reasoning. We introduce Neural-MedBench, a compact yet reasoning-intensive benchmark specifically designed to probe the limits of multimodal clinical reasoning in neurology. Neural-MedBench integrates multi-sequence MRI scans, structured electronic health records, and clinical notes, and encompasses three core task families: differential diagnosis, lesion recognition, and rationale generation. Code/project link: https://neuromedbench.github.io/

医学影像计算医疗多模态临床语言智能论文 vision-language models benchmark dataset 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

MedAgent-Pro：通过推理型 Agent 工作流迈向证据型多模态医学诊断

ICLR 2026 Poster accepted paper at ICLR 2026. Modern clinical diagnosis relies on the comprehensive analysis of multi-modal patient data, drawing on medical expertise to ensure systematic and rigorous reasoning. Recent advances in Vision–Language Models (VLMs) and agent-based methods are reshaping medical diagnosis by effectively integrating multi-modal information. However, they often output direct answers and empirical-driven conclusions without clinical evidence supported by quantitative analysis, which compromises their reliability and hinders clinical usability. Here we propose MedAgent-Pro, an agentic reasoning paradigm that mirrors modern diagnosis principles via a hierarchical diagnostic workflow, consisting of disease-level standardized plan generation and patient-level personalized step-by-step reasoning.

医学影像计算医疗多模态临床语言智能论文 Medical AI Agentic AI 查看论文详情

论文ICLR 2026 Poster2026 年clinical prediction

通过概念型多模态协同适配桥接放射学与病理学基础模型

ICLR 2026 Poster accepted paper at ICLR 2026. Pretrained medical foundation models (FMs) have shown strong generalization across diverse imaging tasks, such as disease classification in radiology and tumor grading in histopathology. While recent advances in parameter-efficient finetuning have enabled effective adaptation of FMs to downstream tasks, these approaches are typically designed for a single modality. In contrast, many clinical workflows rely on joint diagnosis from heterogeneous domains, such as radiology and pathology, where fully leveraging the representation capacity of multiple FMs remains an open challenge. To address this gap, we propose Concept Tuning and Fusing (CTF), a parameter-efficient framework that uses clinically grounded concepts as a shared semantic interface to enable cross-modal co-adaptation before fusion. Code/project link: https://github.com/HKU-MedAI/CTF; https://github.com/neuronflow/BraTS-Toolkit

医学影像计算医疗多模态 EHR 与临床预测论文 multimodal learning concept-based learning 查看论文详情

论文ICLR 2026 Poster2026 年Medical multimodal AI

AttTok：将属性 token 与生成式预训练视觉语言模型结合用于医学图像理解

ICLR 2026 poster introducing AttTok, a medical vision-language method that uses predefined attribute tokens and attribute-centric mechanisms to improve medical image understanding, including classification and visual question answering.

医学影像计算医疗多模态临床语言智能论文 ICLR 2026 medical generative pre-trained models 查看论文详情

论文ICLR 2026 Poster2026 年医疗多模态

医学 MLLM 如何失效？医学图像视觉定位研究

系统研究医学 MLLM 在医学图像视觉定位中的失效模式，提出 VGMED 评估数据集与 VGRefine 推理时方法，面向医学视觉问答与医学图像解释场景。

医疗多模态医疗 AI 论文会议论文查看论文详情

论文npj Digital Surgery2026 年手术与介入智能

Surgical RARP Copilot：用于机器人辅助根治性前列腺切除术的视觉语言模型

Surgical RARP Copilot 是一项面向机器人辅助根治性前列腺切除术的手术视觉语言模型研究。论文将视觉与文本信息结合，用于术中开放问答、手术阶段识别和器械检测，并报告了实时手术场景测试。该研究适合作为手术场景理解、术中 AI 辅助和医疗多模态模型在手术环境中应用的参考论文。

手术与介入智能医疗 AI 论文期刊论文查看论文详情

论文arXiv2023 年临床多模态 AI

迈向通用生物医学 AI

Med-PaLM Multimodal evaluates a generalist biomedical AI system across medical question answering, imaging, report generation, and multimodal tasks.

通用医学 AI 多模态临床推理查看论文详情

论文Nature Medicine2024 年基础模型

面向多样生物医学任务的通用视觉语言基础模型

Nature Medicine article describing a generalist biomedical vision-language foundation model evaluated across multiple biomedical tasks.

视觉语言基础模型生物医学 AI 查看论文详情

数据资源MRI, DXA, ultrasound, retinal imaging, genetics, and health recordspopulation-scale multimodal imaging cohortPopulation-scale UK Biobank imaging cohort; application required申请访问

UK Biobank 影像数据

UK Biobank Imaging provides large-scale imaging phenotypes linked to genetic, lifestyle, and health outcome data. It is used for population-scale medical imaging AI, disease risk prediction, representation learning, multimodal biomedical modeling, and epidemiological AI studies.

医学影像计算 EHR 与临床预测医疗多模态数据集 population cohort imaging 查看数据资源

数据资源genomics, transcriptomics, clinical metadata, and pathology-related datacancer genomics and clinical datasetLarge multi-cancer TCGA program dataset开放访问

TCGA 癌症基因组数据集

The Cancer Genome Atlas is a large cancer genomics resource with molecular, clinical, and pathology-related data across many cancer types. It is a foundation dataset for oncology AI, survival prediction, subtype discovery, multimodal cancer modeling, and translational biomarker research.

EHR 与临床预测医疗多模态数据集 cancer genomics 肿瘤学 multi-omics 查看数据资源

数据资源MRI, PET, biomarkers, clinical and cognitive assessmentslongitudinal neuroimaging and clinical datasetLongitudinal ADNI cohort data; access through ADNI/LONI申请访问

ADNI 阿尔茨海默病神经影像倡议数据集

ADNI provides longitudinal neuroimaging, biomarker, clinical, and cognitive data for Alzheimer disease research. It supports disease progression modeling, dementia diagnosis, multimodal prediction, biomarker discovery, and clinical translation studies.

医学影像计算 EHR 与临床预测医疗多模态数据集 Alzheimer disease neuroimaging 查看数据资源

数据资源chest radiographs with radiology reportschest X-ray image-report datasetLarge-scale CXR image-report dataset; version 2.1.0申请访问

MIMIC-CXR v2.1.0 胸部 X 光数据集

MIMIC-CXR is a large deidentified chest radiograph dataset with associated free-text radiology reports. It is widely used for chest X-ray classification, report generation, image-text representation learning, radiology retrieval, and medical multimodal foundation model evaluation.

医学影像计算医疗多模态临床语言智能数据集 CXR radiology reports 查看数据资源

数据资源medical images with bilingual visual questions and answersmedical visual question answering datasetBilingual medical VQA dataset; see official project page开放访问

SLAKE：语义标注、知识增强医学 VQA 数据集

SLAKE is a semantically labeled medical visual question answering dataset with bilingual English-Chinese questions, medical images, and knowledge-enhanced annotations. It is useful for medical multimodal learning, image-grounded QA, and radiology VQA evaluation.

医学影像计算医疗多模态数据集 medical VQA bilingual dataset medical multimodal 查看数据资源

数据资源胸部 X 光放射影像PhysioNet v2.1.0受限访问

MIMIC-CXR-JPG v2.1.0

JPG-formatted chest radiographs with labels derived from free-text reports, hosted by PhysioNet.

放射影像胸部 X 光 PhysioNet 查看数据资源

数据资源Text and medical imagesModelMedGemma / MedSigLIP model family开放访问

MedGemma / MedSigLIP 医学 AI 模型

Google Health AI Developer Foundations open model resources for medical text and medical image understanding, including MedGemma 1.5 resources.

medical LLM medical VLM open model 查看数据资源

数据资源Multimodal clinical dataBenchmarkICML 2025 benchmark开放访问

CLIMB 临床基础模型基准

Multimodal clinical data foundation and benchmark introduced at ICML 2025 for clinical foundation model research.

benchmark 多模态 clinical foundation model 查看数据资源

技术竞赛Training release scheduled 2026-05-11 17:00 BeijingReport generationPathology images and text截止北京时间 2026-07-20 17:00

REG 2026：病理学家推理引导的报告生成挑战

MICCAI 2026 challenge for pathologist reasoning-guided pathology report generation, hosted on Grand Challenge.

pathology report generation MICCAI challenge 查看竞赛详情

征稿与合作Scientific Reports截止北京时间 2026-06-23期刊专刊

Scientific Reports 专辑：临床决策 AI

This Nature Portfolio / Scientific Reports collection is open for submissions until 2026-06-23. It focuses on AI for clinical decision-making, including diagnostic, prognostic, and therapeutic decision support, EHRs, medical imaging, genomics, real-time patient data, clinical notes, multimodal learning, privacy-preserving AI, interpretability, and validation.

EHR 与临床预测医疗多模态临床语言智能征稿 Nature Portfolio Scientific Reports 查看征稿详情

征稿与合作npj Genomic Medicine截止北京时间 2026-06-23期刊专刊

npj Genomic Medicine 专辑：基因组医学中的人工智能

This Nature Portfolio / npj Genomic Medicine collection is open for submissions until 2026-06-23. It covers AI-powered genomic medicine, including variant prioritization, pathway inference, AI prediction from clinical assays such as histology, radiology and EHRs, multi-omics, precision oncology, rare diseases, population health, explainability, bias, and clinical implementation.

医疗多模态可信、安全、公平与隐私征稿 Nature Portfolio npj Genomic Medicine genomic medicine 查看征稿详情

征稿与合作npj Digital Medicine截止北京时间 2026-05-06期刊专刊

npj Digital Medicine 专辑：个性化疾病预测中的物理信息机器学习

This Nature Portfolio / npj Digital Medicine collection is open for submissions until 2026-05-06. It calls for physics-informed machine learning for personalized disease prediction, prevention, and management, including digital twins, physics-informed generative AI, biomedical time-series, signals, images, interpretability, and clinical decision support.

EHR 与临床预测医疗多模态可信、安全、公平与隐私征稿 Nature Portfolio npj Digital Medicine 查看征稿详情

征稿与合作npj Digital Medicine截止北京时间 2027-04-30期刊专刊

npj Digital Medicine 专辑：多模态数据与 AI 时代的计算药物重定位

This Nature Portfolio / npj Digital Medicine collection is open for submissions until 2027-04-30. It invites work at the intersection of computational drug repurposing, multimodal biomedical data, and AI, including omics, EHRs, real-world evidence, imaging, digital phenotyping, LLMs, graph neural networks, multimodal transformers, knowledge graphs, generative AI, causal inference, explainability, and clinical translation.

医疗多模态征稿 Nature Portfolio npj Digital Medicine drug repurposing multimodal data 查看征稿详情

征稿与合作npj Digital Medicine截止北京时间 2026-07-21期刊专刊

npj Digital Medicine 专辑：运动医学中的人工智能

This Nature Portfolio / npj Digital Medicine collection is open for submissions until 2026-07-21. It invites research on AI in sports medicine, including multimodal injury and medical-condition prediction, individualized diagnosis, treatment and rehabilitation, transparent and diverse datasets, open-source explainable AI, and safe AI systems for athlete and exercise health.

医疗多模态可信、安全、公平与隐私征稿 Nature Portfolio npj Digital Medicine sports medicine 查看征稿详情

征稿与合作Technologies截止北京时间 2026-08-30期刊专刊

MDPI Technologies 专刊：AI 赋能的智慧医疗系统

This Technologies special issue calls for work on AI-enabled smart healthcare systems. It is relevant to medical AI submissions on intelligent monitoring, anomaly detection, assistive technologies, smart sensing, clinical decision support, and AI-assisted healthcare workflows. The page lists a manuscript submission deadline of 2026-08-30.

EHR 与临床预测医疗多模态征稿 MDPI Technologies smart healthcare assistive technologies 查看征稿详情

征稿与合作AI截止北京时间 2026-10-27期刊专刊

MDPI AI 专刊：对抗学习及其在医疗中的应用

This MDPI AI special issue calls for work on adversarial learning and its applications in healthcare, including robustness, privacy attacks and defenses, federated learning, generative AI for healthcare, and medical image analysis. The page lists a manuscript submission deadline of 2026-10-27.

可信、安全、公平与隐私医学影像计算医疗多模态征稿 MDPI AI adversarial learning 查看征稿详情

征稿与合作Frontiers in Artificial Intelligence / Frontiers Research Topic截止北京时间 2026-09-14期刊专刊

Frontiers Research Topic：临床决策中的多组学整合

This Frontiers Research Topic calls for work on integrating multi-omics data with clinical information to improve diagnosis, prognosis, and personalized treatment. The page lists a manuscript deadline of 2026-09-14 and is currently accepting articles, making it a relevant journal CFP for clinical translation, multimodal medical AI, and precision medicine.

医疗多模态 EHR 与临床预测征稿 Frontiers multi-omics clinical decision-making 查看征稿详情

征稿与合作ICONIP 2026截止北京时间 2026-05-10会议征稿

ICONIP 2026 征稿

CCF-Deadlines lists ICONIP 2026 with papers due 2026-05-10 UTC-12 and conference dates 2026-11-23 to 2026-11-27 in Melbourne. The neural information processing scope is relevant to medical AI work on deep learning, biomedical signals, medical imaging, and clinical prediction.

医学影像计算 EHR 与临床预测医疗多模态征稿 CCF C ICONIP 查看征稿详情

征稿与合作IEEE BigData 2026截止北京时间 2026-08-21会议征稿

IEEE BigData 2026 征稿

CCF-Deadlines lists IEEE BigData 2026 with papers due 2026-08-21 AoE and conference dates 2026-12-14 to 2026-12-17 in Phoenix. IEEE BigData is relevant to healthcare big data, clinical data integration, EHR-scale prediction, biomedical multimodal analytics, and privacy-aware health data mining.

EHR 与临床预测医疗多模态可信、安全、公平与隐私征稿 CCF C IEEE BigData 查看征稿详情

征稿与合作APBC 2026截止北京时间 2026-06-15会议征稿

APBC 2026 征稿

CCF-Deadlines lists APBC 2026 with papers due 2026-06-15 UTC+0 and conference dates 2026-10-21 to 2026-10-24 in Hsinchu. APBC is directly relevant to medical AI through bioinformatics, computational biology, multi-omics modeling, precision medicine, and AI-assisted biomedical discovery.

医疗多模态 EHR 与临床预测征稿 CCF C APBC bioinformatics 查看征稿详情

课程资源学术讲座intermediate

Broad Institute ML4H 临床 AI 研讨系列

The Broad Institute ML4H Clinical AI Seminar Series features talks from leading experts at the intersection of AI and medicine. Topics include generative and foundation models, ethical and responsible AI, self-supervised learning, medical imaging, digital twins, and real-world clinical applications.

医疗多模态可信、安全、公平与隐私讲座 Broad Institute ML4H clinical AI seminar 查看课程资源