论文ICLR 2026 Poster2026 年clinical NLP VLM-SubtleBench:VLM 距离人类级细微比较推理还有多远?
ICLR 2026 Poster accepted paper at ICLR 2026. The ability to distinguish subtle differences between visually similar images is essential for diverse domains such as industrial anomaly detection, medical imaging, and aerial surveillance. While comparative reasoning benchmarks for vision-language models (VLMs) have recently emerged, they primarily focus on images with large, salient differences and fail to capture the nuanced reasoning required for real-world applications. In this work, we introduce **VLM-SubtleBench**, a benchmark designed to evaluate VLMs on *subtle comparative reasoning*. Our benchmark covers ten difference types—Attribute, State, Emotion, Temporal, Spatial, Existence, Quantity, Quality, Viewpoint, and Action—and curate paired question–image sets reflecting these fine-grained variations.
论文ICLR 2026 Poster2026 年医学影像 CARL:面向光谱图像分析的相机无关表征学习
ICLR 2026 Poster accepted paper at ICLR 2026. Spectral imaging offers promising applications across diverse domains, including medicine and urban scene understanding, and is already established as a critical modality in remote sensing. However, variability in channel dimensionality and captured wavelengths among spectral cameras impede the development of AI-driven methodologies, leading to camera-specific models with limited generalizability and inadequate cross-camera applicability. To address this bottleneck, we introduce CARL, a model for Camera-Agnostic Representation Learning across RGB, multispectral, and hyperspectral imaging modalities. To enable the conversion of a spectral image with any channel dimensionality to a camera-agnostic representation, we introduce a novel spectral encoder, featuring a self-attention-cross-attention mechanism, to distill salient spectral information into learned spectral representations. Code/project link: https://github.com/IMSY-DKFZ/CARL
论文ICLR 2026 Poster2026 年医学影像 无需甲基化输入的全基因组 DNA 甲基化预测新范式
ICLR 2026 Poster accepted paper at ICLR 2026. DNA methylation (DNAm) is a key epigenetic modification that regulates gene expression and is pivotal in development and disease. However, profiling DNAm at genome scale is challenging: of $\textasciitilde$28 million CpG sites in the human genome, only about 1–3\% are typically assayed in common datasets due to technological limitations and cost. Recent deep learning approaches, including masking-based generative Transformer models, have shown promise in capturing DNAm–gene expression relationships, but they rely on partially observed DNAm values for unmeasured CpGs and cannot be applied to completely unmeasured samples. To overcome this barrier, we introduce MethylProphet, a gene-guided, context-aware Transformer model for whole-genome DNAm inference without any measured DNAm input.
论文ICLR 2026 Poster2026 年医学影像 Reconstruct Anything Model:面向计算成像的轻量级通用模型
ICLR 2026 Poster accepted paper at ICLR 2026. Most existing learning-based methods for solving imaging inverse problems can be roughly divided into two classes: iterative algorithms, such as plug-and-play and diffusion methods leveraging pretrained denoisers, and unrolled architectures that are trained end-to-end for specific imaging problems. Iterative methods in the first class are computationally costly and often yield suboptimal reconstruction performance, whereas unrolled architectures are generally problem-specific and require expensive training. In this work, we propose a novel non-iterative, lightweight architecture that incorporates knowledge about the forward operator (acquisition physics and noise parameters) without relying on unrolling. Our model is trained to solve a wide range of inverse problems, such as deblurring, magnetic resonance imaging, computed tomography, inpainting, and super-resolution, and handles arbitrary image sizes and channels, such as grayscale, complex, and color data. Code/project link: https://github.com/matthieutrs/ram
论文ICLR 2026 Poster2026 年医学影像 分布一致性损失:超越反问题中的逐点数据项
ICLR 2026 Poster accepted paper at ICLR 2026. Recovering true signals from noisy measurements is a central challenge in inverse problems spanning medical imaging, geophysics, and signal processing. Current solutions nearly always balance prior assumptions regarding the true signal (regularization) with agreement to noisy measured data (data-fidelity). Conventional data-fidelity loss functions, such as mean-squared error (MSE) or negative log-likelihood, seek pointwise agreement with noisy measurements, often leading to overfitting to noise. In this work, we instead evaluate data-fidelity collectively by testing whether the observed measurements are statistically consistent with the noise distributions implied by the current estimate.
论文ICLR 2026 Poster2026 年医学影像 MnemoDyn:从 4 万条 fMRI 序列学习静息态动力学
ICLR 2026 Poster accepted paper at ICLR 2026. We present a dynamical-systems based model for resting-state functional magnetic resonance imaging (rs-fMRI), trained on a dataset of roughly $40$K rs-fMRI sequences covering a wide variety of public and available-by-permission datasets. While most existing proposals use transformer backbones, we utilize multi-resolution temporal modeling of the dynamics across parcellated brain regions. We show that MnemoDyn is compute efficient and generalizes very well across diverse populations and scanning protocols. When benchmarked against current state-of-the-art transformer-based approaches, MnemoDyn consistently delivers superior reconstruction quality.
论文ICLR 2026 Poster2026 年医学影像 受认知过程启发的主体无关脑视觉解码架构
ICLR 2026 Poster accepted paper at ICLR 2026. Subject-agnostic brain decoding, which aims to reconstruct continuous visual experiences from fMRI without subject-specific training, holds great potential for clinical applications. However, this direction remains underexplored due to challenges in cross-subject generalization and the complex nature of brain signals. In this work, we propose Visual Cortex Flow Architecture (VCFlow), a novel hierarchical decoding framework that explicitly models the ventral-dorsal architecture of the human visual system to learn multi-dimensional representations. By disentangling and leveraging features from early visual cortex, ventral, and dorsal streams, VCFlow captures diverse and complementary cognitive information essential for visual reconstruction.
论文ICLR 2026 Poster2026 年clinical prediction 面向少样本异常检测的双重蒸馏
ICLR 2026 Poster accepted paper at ICLR 2026. Anomaly detection is a critical task in computer vision with profound implications for medical imaging, where identifying pathologies early can directly impact patient outcomes. While recent unsupervised anomaly detection approaches show promise, they require substantial normal training data and struggle to generalize across anatomical contexts. We introduce D$^2$4FAD, a novel dual distillation framework for few-shot anomaly detection that identifies anomalies in previously unseen tasks using only a small number of normal reference images. Our approach leverages a pre-trained encoder as a teacher network to extract multi-scale features from both support and query images, while a student decoder learns to distill knowledge from the teacher on query images and self-distill on support images. Code/project link: https://github.com/ttttqz/D24FAD
论文ICLR 2026 Poster2026 年医学影像 Disco:通过邻接感知协同着色实现密集重叠细胞实例分割
ICLR 2026 Poster accepted paper at ICLR 2026. Accurate cell instance segmentation is foundational for digital pathology analysis. Existing methods based on contour detection and distance mapping still face significant challenges in processing complex and dense cellular regions. Graph coloring-based methods provide a new paradigm for this task, yet the effectiveness of this paradigm in real-world scenarios with dense overlaps and complex topologies has not been verified. Addressing this issue, we release a large-scale dataset GBC-FS 2025, which contains highly complex and dense sub-cellular nuclear arrangements. We conduct the first systematic analysis of the chromatic properties of cell adjacency graphs across four diverse datasets and reveal an important discovery: most real-world cell graphs are non-bipartite, with a high prevalence of odd-length cycles (predominantly triangles).
论文ICLR 2026 Poster2026 年医学影像 脑图基础模型:跨多图谱与疾病的预训练和提示微调
ICLR 2026 Poster accepted paper at ICLR 2026. As large language models (LLMs) continue to revolutionize AI research, there is a growing interest in building large-scale brain foundation models to advance neuroscience. While most existing brain foundation models are pre-trained on time-series signals or connectome features, we propose a novel graph-based pre-training paradigm for constructing a brain graph foundation model. In this paper, we introduce the Brain Graph Foundation Model, termed BrainGFM, a unified framework that leverages graph contrastive learning and graph masked autoencoders for large-scale fMRI-based pre-training. BrainGFM is pre-trained on a diverse mixture of brain atlases with varying parcellations, significantly expanding the pre-training corpus and enhancing the model’s ability to generalize across heterogeneous fMRI-derived brain representations. Code/project link: https://github.com/weixinxu666/BrainGFM
论文ICLR 2026 Poster2026 年trustworthy medical AI Dual-Kernel Adapter:拓展数据受限医学图像分析的空间视野
ICLR 2026 Poster accepted paper at ICLR 2026. Adapters have become a widely adopted strategy for efficient fine-tuning of foundation models, particularly in resource-constrained settings. However, their performance under extreme data scarcity—common in medical imaging due to high annotation costs, privacy regulations, and fragmented datasets—remains underexplored. In this work, we present the first comprehensive study of adapter-based fine-tuning for vision foundation models in low-data medical imaging scenarios. We find that, contrary to their promise, conventional Adapters can degrade performance under severe data constraints, performing even worse than simple linear probing when trained on less than 1\% of the corresponding training data.
论文ICLR 2026 Poster2026 年医学影像 统一脑表面与脑体积配准
ICLR 2026 Poster accepted paper at ICLR 2026. Accurate registration of brain MRI scans is fundamental for cross-subject analysis in neuroscientific studies. This involves aligning both the cortical surface of the brain and the interior volume. Traditional methods treat volumetric and surface-based registration separately, which often leads to inconsistencies that limit downstream analyses. We propose a deep learning framework, UCS, that registers 3D brain MRI images by jointly aligning both cortical and subcortical regions, through a unified volume-and-surface-based representation. Our approach leverages an intermediate spherical coordinate space to bridge anatomical surface topology with volumetric anatomy, enabling consistent and anatomically accurate alignment.
论文ICLR 2026 Poster2026 年医学影像 超越网格锁定体素:连续脑编码的神经响应函数
ICLR 2026 Poster accepted paper at ICLR 2026. Neural encoding models aim to predict fMRI-measured brain responses to natural images. fMRI data is acquired as a 3D volume of voxels, where each voxel has a defined spatial location in the brain. However, conventional encoding models often flatten this volume into a 1D vector and treat voxel responses as independent outputs. This removes spatial context, discards anatomical information, and ties each model to a subject-specific voxel grid. We introduce the NRF Neural Response Function, a framework that models fMRI activity as a continuous function over anatomical space rather than a flat vector of voxels. NRF represents brain activity as a continuous implicit function: given an image and a spatial coordinate (x, y, z) in standardized MNI space, the model predicts the response at that location.
论文ICLR 2026 Poster2026 年clinical prediction MRI 运动校正的可靠评测:数据集与洞见
ICLR 2026 Poster accepted paper at ICLR 2026. Correcting motion artifacts in scientific and medical imaging is important, as they significantly impact image quality. However, evaluating deep learning-based and classical motion correction methods remains fundamentally difficult due to the lack of accessible ground-truth target data. To address this challenge, we study three evaluation approaches: real-world evaluation based on reference scans, simulated motion, and reference-free evaluation, each with its merits and shortcomings. To enable evaluation with real-world motion artifacts, we release PMoC3D, a dataset consisting of unprocessed $\textbf{P}$aired $\textbf{Mo}$tion-$\textbf{C}$orrupted $\textbf{3D}$ brain MRI data.
论文ICLR 2026 Poster2026 年医学影像 HistoPrism:通过基因表达预测从泛癌组织学解锁功能通路分析
ICLR 2026 Poster accepted paper at ICLR 2026. Predicting spatial gene expression from H\&E histology offers a scalable and clinically accessible alternative to sequencing, but realizing clinical impact requires models that generalize across cancer types and capture biologically coherent signals. Prior work is often limited to per-cancer settings and variance-based evaluation, leaving functional relevance underexplored. We introduce HistoPrism, an efficient transformer-based architecture for pan-cancer prediction of gene expression from histology. To evaluate biological meaning, we introduce a pathway-level benchmark, shifting assessment from isolated gene-level variance to coherent functional pathways.
论文ICLR 2026 Poster2026 年trustworthy medical AI 用谱熵正则重新思考医学图像分割中的模型校准
ICLR 2026 Poster accepted paper at ICLR 2026. Deep neural networks for medical image segmentation often produce overconfident predictions, posing clinical risks due to miscalibrated uncertainty estimates. In this work, we rethink model calibration from a frequency-domain perspective and identify two critical factors causing miscalibration: spectral bias, where models overemphasize low-frequency components, and confidence saturation, which suppresses overall power spectral density in confidence maps. To address these challenges, we propose a novel frequency-aware calibration framework integrating spectral entropy regularization and power spectral smoothing. The spectral entropy term promotes a balanced frequency spectrum and enhances overall spectral power, enabling better modeling of high-frequency boundary and low-frequency structural uncertainty.
论文ICLR 2026 Poster2026 年医学影像 CardioComposer:利用可微几何实现解剖扩散模型的组合式控制
ICLR 2026 Poster accepted paper at ICLR 2026. Generative models of 3D cardiovascular anatomy can synthesize informative structures for clinical research and medical device evaluation, but face a trade-off between geometric controllability and realism. We propose CardioComposer: a programmable, inference time framework for generating multi-class anatomical label maps from interpretable ellipsoidal primitives. These primitives represent geometric attributes such as the size, shape, and position of discrete substructures. We specifically develop differentiable measurement functions based on voxel-wise geometric moments, enabling loss-based gradient guidance during diffusion model sampling. Code/project link: https://github.com/kkadry/CardioComposer
论文ICLR 2026 Poster2026 年clinical NLP 迈向医学图像分割中的文本-掩膜一致性
ICLR 2026 Poster accepted paper at ICLR 2026. Vision-language models for medical image segmentation often produce masks that conflict with the accompanying text, especially under multi-site/multi-lesion descriptions. We trace this failure to two factors: (i) highly templated and repetitive clinical language causes one-to-one hard contrastive learning to yield numerous false negatives, weakening cross-modal alignment; and (ii) predominantly vision-driven, one-way cross-attention lacks a language-dominant, spatially aware pathway, hindering effective injection of textual semantics into the spatial visual domain. To this end, we propose Consistency-enhanced Two-stage Segmentation (C2Seg). In the pretraining stage, Cluster-aware Contrastive Learning uses a frozen strong baseline to construct an intra-batch text similarity matrix as soft labels, thereby alleviating false negative conflicts and producing more discriminative visual representations.
论文ICLR 2026 Poster2026 年trustworthy medical AI 随机锚点与低秩去相关学习:类增量医学图像分类的极简流程
ICLR 2026 Poster accepted paper at ICLR 2026. Class-incremental learning (CIL) in medical image-guided diagnosis requires models to preserve knowledge of historical disease classes while adapting to emerging categories. Pre-trained models (PTMs) with well-generalized features provide a strong foundation, yet most PTM-based CIL strategies, such as prompt tuning, task-specific adapters and model mixtures, rely on increasingly complex designs. While effective in general-domain benchmarks, these methods falter in medical imaging, where low intra-class variability and high inter-domain shifts (from scanners, protocols and institutions) make CIL particularly prone to representation collapse and domain misalignment. Under such conditions, we find that lightweight representation calibration strategies, often dismissed in general-domain CIL for their modest gains, can be remarkably effective for adapting PTMs in medical settings.
论文ICLR 2026 Poster2026 年医学影像 Mini Experts 混合:突破多实例学习中的线性层瓶颈
ICLR 2026 Poster accepted paper at ICLR 2026. Multiple Instance Learning (MIL) is the predominant framework for classifying gigapixel whole-slide images in computational pathology. MIL follows a sequence of 1) extracting patch features, 2) applying a linear layer to obtain task-specific patch features, and 3) aggregating the patches into a slide feature for classification. While substantial efforts have been devoted to optimizing patch feature extraction and aggregation, none have yet addressed the second point, the critical layer which transforms general-purpose features into task-specific features. We hypothesize that this layer constitutes an overlooked performance bottleneck and that stronger representations can be achieved with a low-rank transformation tailored to each patch's phenotype, yielding synergistic effects with any of the existing MIL approaches.
论文ICLR 2026 Poster2026 年clinical prediction 用跨切片一致随机性改进 3D 医学影像的 2D 扩散模型
ICLR 2026 Poster accepted paper at ICLR 2026. 3D medical imaging is in high demand and essential for clinical diagnosis and scientific research. Currently, diffusion models have become an effective tool for medical imaging reconstruction thanks to their ability to learn rich, high‑quality data priors. However, learning the 3D data distribution with diffusion models in medical imaging is challenging, not only due to the difficulties in data collection but also because of the significant computational burden during model training. A common compromise is to train the diffusion model on 2D data priors and reconstruct stacked 2D slices to address 3D medical inverse problems. Code/project link: https://github.com/duchenhe/ISCS
论文ICLR 2026 Poster2026 年trustworthy medical AI 面向未见专家的身份无关延迟决策
ICLR 2026 Poster accepted paper at ICLR 2026. Learning to Defer (L2D) improves AI reliability in decision-critical environments by training AI to either make its own prediction or defer the decision to a human expert. A key challenge is adapting to unseen experts at test time, whose competence can differ from the training population. Current methods for this task, however, can falter when unseen experts are out-of-distribution (OOD) relative to the training population. We identify a core architectural flaw as the cause: they learn identity-conditioned policies by processing class-indexed signals in fixed coordinates, creating shortcuts that violate the problem's inherent permutation symmetry.
论文ICLR 2026 Poster2026 年clinical NLP 通过多粒度语言学习增强医学视觉理解
ICLR 2026 Poster accepted paper at ICLR 2026. Recent advances in image-text pretraining have significantly enhanced visual understanding by aligning visual and textual representations. Contrastive Language-Image Pretraining (CLIP) has played a pivotal role in multimodal learning. However, its focus on single-label, single-granularity alignment limits its effectiveness in complex domains such as medical imaging, where images often correspond to multiple labels across different levels of granularity. To address this, we propose Multi-Granular Language Learning (MGLL), a contrastive learning framework designed to improve both multi-label and cross-granularity alignment. Code/project link: https://github.com/HUANGLIZI/MGLL
论文ICLR 2026 Poster2026 年医学影像 你指点,我学习:交互式分割模型在线适配医学影像分布偏移
ICLR 2026 Poster accepted paper at ICLR 2026. Interactive segmentation uses real-time user inputs, such as mouse clicks, to iteratively refine model predictions. Although not originally designed to address distribution shifts, this paradigm naturally lends itself to such challenges. In medical imaging, where distribution shifts are common, interactive methods can use user inputs to guide models towards improved predictions. Moreover, once a model is deployed, user corrections can be used to adapt the network parameters to the new data distribution, mitigating distribution shift. Based on these insights, we aim to develop a practical, effective method for improving the adaptive capabilities of interactive segmentation models to new data distributions in medical imaging. Code/project link: https://github.com/WenTXuL/OAIMS
论文ICLR 2026 Poster2026 年clinical prediction CRONOS:4D 医学纵向序列的连续时间重建
ICLR 2026 Poster accepted paper at ICLR 2026. Forecasting how 3D medical scans evolve along time is important for disease progression, treatment planning, and developmental assessment. Yet existing models either rely on a single prior scan, fixed grid times, or target global labels, which limits voxel-level forecasting under irregular sampling. We present CRONOS, a unified framework for many-to-one prediction from multiple past scans that supports both discrete (grid-based) and continuous (real-valued) timestamps in one model, to the best of our knowledge the first to achieve continuous sequence-to-image forecasting for 3D medical data. CRONOS learns a spatio-temporal velocity field that transports context volumes toward a target volume at an arbitrary time, while operating directly in 3D voxel space.
论文ICLR 2026 Poster2026 年医学影像 MedGMAE:面向医学体数据表征学习的 Gaussian 掩码自编码器
ICLR 2026 Poster accepted paper at ICLR 2026. Self-supervised pre-training has emerged as a critical paradigm for learning transferable representations from unlabeled medical volumetric data. Masked autoencoder based methods have garnered significant attention, yet their application to volumetric medical image faces fundamental limitations from the discrete voxel-level reconstruction objective, which neglects comprehensive anatomical structure continuity. To address this challenge, We propose MedGMAE, a novel framework that replaces traditional voxel reconstruction with 3D Gaussian primitives reconstruction as new perspectives on representation learning. Our approach learns to predict complete sets of 3D Gaussian parameters as semantic abstractions to represent the entire 3D volume, from sparse visible image patches. Code/project link: https://github.com/windrise/MedGMAE; https://anonymous.4open.science/r/MedGMAE-EC8F/
论文ICLR 2026 Poster2026 年trustworthy medical AI 基于互信息正则的频率均衡视网膜表征学习
ICLR 2026 Poster accepted paper at ICLR 2026. We propose a frequency-oriented perspective on retinal representation learning by analyzing masked autoencoders (MAE) through the lens of spatial frequency. Our analysis shows that MAE favors low-frequency content while under-encoding diagnostically critical high-frequency structures in retinal images. Because retinal pathology often manifests in high-frequency detail, this bias limits diagnostic performance and motivates frequency-balanced representations. Within a mutual-information (MI) formulation of MAE, we introduce the Frequency-Balanced Retinal Masked Autoencoder (RetMAE), which augments the reconstruction objective with a MI regularizer that suppresses low-frequency redundancy and accentuates clinically salient high-frequency information.
论文ICLR 2026 Poster2026 年clinical NLP 用于胸部 X 光图像的结构化、标注式、定位化 VQA 数据集:含完整句答案与场景图
ICLR 2026 Poster accepted paper at ICLR 2026. Visual Question Answering (VQA) enables targeted and context-dependent analysis of medical images, such as chest X-rays (CXRs). However, existing VQA datasets for CXRs are typically constrained by simplistic and brief answer formats, lacking localization annotations (e.g., bounding boxes) and structured tags (e.g., region or radiological finding/disease tags). To address these limitations, we introduce MIMIC-Ext-CXR-QBA (abbr. CXR-QBA), a large-scale CXR VQA dataset derived from MIMIC-CXR, comprising 42 million QA-pairs with multi-granular, multi-part answers, detailed bounding boxes, and structured tags. Code/project link: https://github.com/philip-mueller/mimic-ext-cxr-qba/
论文ICLR 2026 Poster2026 年clinical prediction M3CoTBench:医学图像理解中 MLLM 思维链基准
ICLR 2026 Poster accepted paper at ICLR 2026. Chain-of-Thought (CoT) reasoning has proven effective in enhancing large language models by encouraging step-by-step intermediate reasoning, and recent advances have extended this paradigm to Multimodal Large Language Models (MLLMs). In the medical domain, where diagnostic decisions depend on nuanced visual cues and sequential reasoning, CoT aligns naturally with clinical thinking processes. However, current benchmarks for medical image understanding generally focus on the final answer while ignoring the reasoning path. An opaque process lacks reliable bases for judgment, making it difficult to assist doctors in diagnosis.
论文ICLR 2026 Poster2026 年clinical prediction Pixel-Level Residual Diffusion Transformer:可扩展 3D CT 体数据生成
ICLR 2026 Poster accepted paper at ICLR 2026. Generating high-resolution 3D CT volumes with fine details remains challenging due to substantial computational demands and optimization difficulties inherent to existing generative models. In this paper, we propose the Pixel-Level Residual Diffusion Transformer (PRDiT), a scalable generative framework that synthesizes high-quality 3D medical volumes directly at voxel-level. PRDiT introduces a two-stage training architecture comprising 1) a local denoiser in the form of an MLP-based blind estimator operating on overlapping 3D patches to separate low-frequency structures efficiently, and 2) a global residual diffusion transformer employing memory-efficient attention to model and refine high-frequency residuals across entire volumes. This coarse-to-fine modeling strategy simplifies optimization, enhances training stability, and effectively preserves subtle structures without the limitations of an autoencoder bottleneck.
论文ICLR 2026 Poster2026 年医学影像 建模像素级自监督嵌入密度用于医学 CT 无监督病理分割
ICLR 2026 Poster accepted paper at ICLR 2026. Accurate detection of all pathological findings in 3D medical images remains a significant challenge, as supervised models are limited to detecting only the few pathology classes annotated in existing datasets. To address this, we frame pathology detection as an unsupervised visual anomaly segmentation (UVAS) problem, leveraging the inherent rarity of pathological patterns compared to healthy ones. We enhance the existing density-based UVAS framework with two key innovations: (1) dense self-supervised learning for feature extraction, eliminating the need for supervised pretraining, and (2) learned, masking-invariant dense features as conditioning variables, replacing hand-crafted positional encodings. Trained on over 30,000 unlabeled 3D CT volumes, our fully self-supervised model, Screener, outperforms existing UVAS methods on four large-scale test datasets comprising 1,820 scans with diverse pathologies. Code/project link: https://github.com/mishgon/screener; https://anonymous.4open.science/r/screener-35EE/
论文ICLR 2026 Poster2026 年trustworthy medical AI AttTok:将属性 token 与生成式预训练视觉语言模型结合用于医学图像理解
ICLR 2026 Poster accepted paper at ICLR 2026. Recent generative pre-trained vision–language (GPTv) models have achieved remarkable success in multi-modal understanding, inspiring their adaptation to medical imaging tasks such as disease diagnosis and visual question answering (VQA). However, current instruction-tuned GPTv models suffer from two key challenges: (1) medical attributes (e.g., disease names, severity grades) are encoded as plain text tokens, collapsing semantically distinct concepts into nearly identical textual sequences; and (2) inadequate textual supervision weakens visual representation learning, leading to severe inter-attribute confusion and misaligned vision–language embeddings. To address these limitations, we introduce attribute tokens (AttTok), a set of pre‑defined special tokens that uniquely encode clinical attributes (e.g., imaging modality, diagnosis, severity) within a structured token space. Complemented by attribute‑centric embedding books, AttTok serves as anchor points for aligning both visual and textual modalities into a shared, discriminative representation space.
论文ICLR 2026 Poster2026 年医学影像 面向医学超声的解剖感知表征学习
ICLR 2026 Poster accepted paper at ICLR 2026. Diagnostic accuracy of ultrasound imaging is limited by qualitative variability and its reliance on the expertise of medical professionals. Such challenges increase demand for computer-aided diagnostic systems that enhance diagnostic accuracy and efficiency. However, the unique texture and structural attributes of ultrasound images, and the scarcity of large-scale ultrasound datasets hinder the effective application of conventional machine learning methodologies. To address the challenges, we propose Anatomy-aware Representation Learning (ARL), a novel self-supervised representation learning framework specifically designed for medical ultrasound imaging.
论文ICLR 2026 Poster2026 年clinical prediction 通过概念型多模态协同适配桥接放射学与病理学基础模型
ICLR 2026 Poster accepted paper at ICLR 2026. Pretrained medical foundation models (FMs) have shown strong generalization across diverse imaging tasks, such as disease classification in radiology and tumor grading in histopathology. While recent advances in parameter-efficient finetuning have enabled effective adaptation of FMs to downstream tasks, these approaches are typically designed for a single modality. In contrast, many clinical workflows rely on joint diagnosis from heterogeneous domains, such as radiology and pathology, where fully leveraging the representation capacity of multiple FMs remains an open challenge. To address this gap, we propose Concept Tuning and Fusing (CTF), a parameter-efficient framework that uses clinically grounded concepts as a shared semantic interface to enable cross-modal co-adaptation before fusion. Code/project link: https://github.com/HKU-MedAI/CTF; https://github.com/neuronflow/BraTS-Toolkit
论文ICLR 2026 Poster2026 年Medical multimodal AI AttTok:将属性 token 与生成式预训练视觉语言模型结合用于医学图像理解
ICLR 2026 poster introducing AttTok, a medical vision-language method that uses predefined attribute tokens and attribute-centric mechanisms to improve medical image understanding, including classification and visual question answering.
论文Nature Communications2024 年医学影像 医学图像中的 Segment Anything
MedSAM adapts the Segment Anything paradigm to medical image segmentation and reports broad evaluation across imaging modalities.
数据资源MRI, DXA, ultrasound, retinal imaging, genetics, and health recordspopulation-scale multimodal imaging cohortPopulation-scale UK Biobank imaging cohort; application required申请访问 UK Biobank 影像数据
UK Biobank Imaging provides large-scale imaging phenotypes linked to genetic, lifestyle, and health outcome data. It is used for population-scale medical imaging AI, disease risk prediction, representation learning, multimodal biomedical modeling, and epidemiological AI studies.
数据资源2D and 3D biomedical imagesstandardized biomedical image benchmark12 2D datasets and 6 3D datasets in MedMNIST v2开放访问 MedMNIST v2 生物医学图像基准
MedMNIST v2 is a standardized collection of lightweight biomedical image classification datasets, including 2D and 3D tasks. It is useful for quick benchmarking, AutoML, foundation model sanity checks, and reproducible evaluation across multiple medical imaging domains.
数据资源医学影像分割基准IMed-361M / IMIS-Bench开放访问 IMed-361M / IMIS-Bench 交互式医学图像分割基准
Interactive medical image segmentation benchmark and baseline from CVPR 2025, covering multiple modalities, organs, and target structures.
技术竞赛Open soonaneurysm image analysisvascular/neurovascular medical imaging开始 北京时间 2026-08-14 TopAneu 2026
Grand Challenge official API lists this medical AI challenge with status OPEN_SOON. Multimodal Vessel-Specific Intracranial Aneurysm Classification and Segmentation Challenge Start date: 2026-08-14.
技术竞赛Open soonmedical image analysis challenge医学影像截止 北京时间 2026-09-08 RARE26
Grand Challenge official API lists this medical AI challenge with status OPEN_SOON. Recognition of Anomalies in low-pREvalence cancer Start date: 2026-05-01. End/deadline date: 2026-09-08.
征稿与合作Scientific Reports截止 北京时间 2026-06-23期刊专刊 Scientific Reports 专辑:临床决策 AI
This Nature Portfolio / Scientific Reports collection is open for submissions until 2026-06-23. It focuses on AI for clinical decision-making, including diagnostic, prognostic, and therapeutic decision support, EHRs, medical imaging, genomics, real-time patient data, clinical notes, multimodal learning, privacy-preserving AI, interpretability, and validation.
征稿与合作npj Digital Medicine截止 北京时间 2026-07-12期刊专刊 npj Digital Medicine 专辑:Agentic AI 对照护交付的影响
This Nature Portfolio / npj Digital Medicine collection is open for submissions until 2026-07-12. It calls for work on agentic AI in care delivery, including real-time evidence-based decision support, virtual and remote patient care, multimodal and longitudinal clinical data, EHRs, medical imaging, genomics, resource-limited deployment, ethics, regulation, quality, and patient safety.
征稿与合作Diagnostics截止 北京时间 2026-12-31期刊专刊 MDPI Diagnostics 专刊:健康与医学人工智能(第二辑)
This Diagnostics special issue calls for work on artificial intelligence for health and medicine, in a journal section focused on machine learning and AI in diagnostics. It is relevant to medical imaging, diagnostic pathology and radiology, digital health, rehabilitation, cybersecurity, patient safety, and clinical AI quality. The page lists a manuscript submission deadline of 2026-12-31.
征稿与合作ICONIP 2026截止 北京时间 2026-05-10会议征稿 ICONIP 2026 征稿
CCF-Deadlines lists ICONIP 2026 with papers due 2026-05-10 UTC-12 and conference dates 2026-11-23 to 2026-11-27 in Melbourne. The neural information processing scope is relevant to medical AI work on deep learning, biomedical signals, medical imaging, and clinical prediction.
征稿与合作MICAD 2026截止 北京时间 2026-07-21 19:59会议征稿 MICAD 2026 征稿
Medical Imaging and Computer-Aided Diagnosis 2026 call for full papers, posters, and oral presentations.
MIT OpenCourseWare:医疗机器学习
MIT OCW 6.S897 Machine Learning for Healthcare introduces clinical data and machine learning for risk stratification, disease progression modeling, precision medicine, diagnosis, subtype discovery, medical imaging, public health, and clinical workflow improvement.
Broad Institute ML4H 临床 AI 研讨系列
The Broad Institute ML4H Clinical AI Seminar Series features talks from leading experts at the intersection of AI and medicine. Topics include generative and foundation models, ethical and responsible AI, self-supervised learning, medical imaging, digital twins, and real-world clinical applications.
Stanford AIMI Grand Rounds
Stanford AIMI Grand Rounds seminar series on artificial intelligence in medicine and imaging.
RSNA 影像 AI 证书
RSNA certificate program for radiology professionals applying AI in medical imaging practice.
Stanford AIMI 课程
Stanford Center for Artificial Intelligence in Medicine and Imaging course page covering AIMI short courses and programs.