AI4Meder

AI4Meder 站内搜索

搜索医学 AI 论文与资源

按论文、数据资源、技术竞赛、投稿截止日期和课程资源检索社区内容,快速进入对应详情页。

77 条结果

输入关键词或点击标签,按论文、数据资源、竞赛截止日期、征稿与课程缩小范围。 标签:dataset 范围:论文

清空筛选
论文ICLR 2026 Poster2026 年trustworthy medical AI

可解释性与嵌入的桥接:让 BEE 识别伪相关

ICLR 2026 Poster accepted paper at ICLR 2026. Current methods for detecting spurious correlations rely on data splits or error patterns, leaving many harmful shortcuts invisible when counterexamples are absent. We introduce BEE (Bridging Explainability and Embeddings), a framework that shifts the focus from model predictions to the weight space and embedding geometry underlying decisions. By analyzing how fine-tuning perturbs pretrained representations, BEE uncovers spurious correlations that remain hidden from conventional evaluation pipelines. We use linear probing as a transparent diagnostic lens, revealing spurious features that not only persist after full fine-tuning but also transfer across diverse state-of-the-art models. Code/project link: https://github.com/bit-ml/bee

论文ICLR 2026 Poster2026 年clinical NLP

VLM-SubtleBench:VLM 距离人类级细微比较推理还有多远?

ICLR 2026 Poster accepted paper at ICLR 2026. The ability to distinguish subtle differences between visually similar images is essential for diverse domains such as industrial anomaly detection, medical imaging, and aerial surveillance. While comparative reasoning benchmarks for vision-language models (VLMs) have recently emerged, they primarily focus on images with large, salient differences and fail to capture the nuanced reasoning required for real-world applications. In this work, we introduce **VLM-SubtleBench**, a benchmark designed to evaluate VLMs on *subtle comparative reasoning*. Our benchmark covers ten difference types—Attribute, State, Emotion, Temporal, Spatial, Existence, Quantity, Quality, Viewpoint, and Action—and curate paired question–image sets reflecting these fine-grained variations.

论文ICLR 2026 Poster2026 年医学影像

CARL:面向光谱图像分析的相机无关表征学习

ICLR 2026 Poster accepted paper at ICLR 2026. Spectral imaging offers promising applications across diverse domains, including medicine and urban scene understanding, and is already established as a critical modality in remote sensing. However, variability in channel dimensionality and captured wavelengths among spectral cameras impede the development of AI-driven methodologies, leading to camera-specific models with limited generalizability and inadequate cross-camera applicability. To address this bottleneck, we introduce CARL, a model for Camera-Agnostic Representation Learning across RGB, multispectral, and hyperspectral imaging modalities. To enable the conversion of a spectral image with any channel dimensionality to a camera-agnostic representation, we introduce a novel spectral encoder, featuring a self-attention-cross-attention mechanism, to distill salient spectral information into learned spectral representations. Code/project link: https://github.com/IMSY-DKFZ/CARL

论文ICLR 2026 Poster2026 年医学影像

无需甲基化输入的全基因组 DNA 甲基化预测新范式

ICLR 2026 Poster accepted paper at ICLR 2026. DNA methylation (DNAm) is a key epigenetic modification that regulates gene expression and is pivotal in development and disease. However, profiling DNAm at genome scale is challenging: of $\textasciitilde$28 million CpG sites in the human genome, only about 1–3\% are typically assayed in common datasets due to technological limitations and cost. Recent deep learning approaches, including masking-based generative Transformer models, have shown promise in capturing DNAm–gene expression relationships, but they rely on partially observed DNAm values for unmeasured CpGs and cannot be applied to completely unmeasured samples. To overcome this barrier, we introduce MethylProphet, a gene-guided, context-aware Transformer model for whole-genome DNAm inference without any measured DNAm input.

论文ICLR 2026 Poster2026 年trustworthy medical AI

面向垂直联邦学习的隐私保障标签遗忘:无需披露的少样本遗忘

ICLR 2026 Poster accepted paper at ICLR 2026. This paper addresses the critical challenge of unlearning in Vertical Federated Learning (VFL), a setting that has received far less attention than its horizontal counterpart. Specifically, we propose the first method tailored to *label unlearning* in VFL, where labels play a dual role as both essential inputs and sensitive information. To this end, we employ a representation-level manifold mixup mechanism to generate synthetic embeddings for both unlearned and retained samples. This is to provide richer signals for the subsequent gradient-based label forgetting and recovery steps. These augmented embeddings are then subjected to gradient-based label forgetting, effectively removing the associated label information from the model. Code/project link: https://github.com/bryanhx/Towards-Privacy-Guaranteed-Label-Unlearning-in-Vertical-Federated-Learning

论文ICLR 2026 Poster2026 年医学影像

Reconstruct Anything Model:面向计算成像的轻量级通用模型

ICLR 2026 Poster accepted paper at ICLR 2026. Most existing learning-based methods for solving imaging inverse problems can be roughly divided into two classes: iterative algorithms, such as plug-and-play and diffusion methods leveraging pretrained denoisers, and unrolled architectures that are trained end-to-end for specific imaging problems. Iterative methods in the first class are computationally costly and often yield suboptimal reconstruction performance, whereas unrolled architectures are generally problem-specific and require expensive training. In this work, we propose a novel non-iterative, lightweight architecture that incorporates knowledge about the forward operator (acquisition physics and noise parameters) without relying on unrolling. Our model is trained to solve a wide range of inverse problems, such as deblurring, magnetic resonance imaging, computed tomography, inpainting, and super-resolution, and handles arbitrary image sizes and channels, such as grayscale, complex, and color data. Code/project link: https://github.com/matthieutrs/ram

论文ICLR 2026 Oral2026 年medical multimodal

面向多模态 GigaVoxel 图像配准的可扩展分布式框架

ICLR 2026 Oral accepted paper at ICLR 2026. In this work, we propose FFDP, a set of IO-aware non-GEMM fused kernels supplemented with a distributed framework for image registration at unprecedented scales. Image registration is an inverse problem fundamental to biomedical and life sciences, but algorithms have not scaled in tandem with image acquisition capabilities. Our framework complements existing model parallelism techniques proposed for large-scale transformer training by optimizing non-GEMM bottlenecks and enabling convolution-aware tensor sharding. We demonstrate unprecedented capabilities by performing multimodal registration of a 100μm ex-vivo human brain MRI volume at native resolution – an inverse problem more than 570× larger than a standard clinical datum in about a minute using only 8 A6000 GPUs.

论文ICLR 2026 Poster2026 年医学影像

分布一致性损失:超越反问题中的逐点数据项

ICLR 2026 Poster accepted paper at ICLR 2026. Recovering true signals from noisy measurements is a central challenge in inverse problems spanning medical imaging, geophysics, and signal processing. Current solutions nearly always balance prior assumptions regarding the true signal (regularization) with agreement to noisy measured data (data-fidelity). Conventional data-fidelity loss functions, such as mean-squared error (MSE) or negative log-likelihood, seek pointwise agreement with noisy measurements, often leading to overfitting to noise. In this work, we instead evaluate data-fidelity collectively by testing whether the observed measurements are statistically consistent with the noise distributions implied by the current estimate.

论文ICLR 2026 Poster2026 年医学影像

MnemoDyn:从 4 万条 fMRI 序列学习静息态动力学

ICLR 2026 Poster accepted paper at ICLR 2026. We present a dynamical-systems based model for resting-state functional magnetic resonance imaging (rs-fMRI), trained on a dataset of roughly $40$K rs-fMRI sequences covering a wide variety of public and available-by-permission datasets. While most existing proposals use transformer backbones, we utilize multi-resolution temporal modeling of the dynamics across parcellated brain regions. We show that MnemoDyn is compute efficient and generalizes very well across diverse populations and scanning protocols. When benchmarked against current state-of-the-art transformer-based approaches, MnemoDyn consistently delivers superior reconstruction quality.

论文ICLR 2026 Poster2026 年trustworthy medical AI

先验感知与上下文引导的主动概率子采样分组

ICLR 2026 Poster accepted paper at ICLR 2026. Subsampling significantly reduces the number of measurements, thereby streamlining data processing and transfer overhead, and shortening acquisition time across diverse real-world applications. The recently introduced Active Deep Probabilistic Subsampling (A-DPS) approach jointly optimizes both the subsampling pattern and the downstream task model, enabling instance- and subject-specific sampling trajectories and effective adaptation to new data at inference time. However, this approach does not fully leverage valuable dataset priors and relies on top-1 sampling, which can impede the optimization process. Herein, we enhance A-DPS by integrating a deterministic (fixed) prior-informed sampling pattern derived from the training dataset, along with group-based sampling via top-k sampling, to achieve more robust optimization—method we call Prior-aware and context-guided Group-based Active DPS (PGA-DPS).

论文ICLR 2026 Poster2026 年trustworthy medical AI

融合像素与基因:计算病理中的空间感知学习

ICLR 2026 Poster accepted paper at ICLR 2026. Recent years have witnessed remarkable progress in multimodal learning within computational pathology. Existing models primarily rely on vision and language modalities; however, language alone lacks molecular specificity and offers limited pathological supervision, leading to representational bottlenecks. In this paper, we propose STAMP, a Spatial Transcriptomics-Augmented Multimodal Pathology representation learning framework that integrates spatially-resolved gene expression profiles to enable molecule-guided joint embedding of pathology images and transcriptomic data. Our study shows that self-supervised, gene-guided training provides a robust and task-agnostic signal for learning pathology image representations. Code/project link: https://github.com/Hanminghao/STAMP

论文ICLR 2026 Poster2026 年clinical prediction

面向少样本异常检测的双重蒸馏

ICLR 2026 Poster accepted paper at ICLR 2026. Anomaly detection is a critical task in computer vision with profound implications for medical imaging, where identifying pathologies early can directly impact patient outcomes. While recent unsupervised anomaly detection approaches show promise, they require substantial normal training data and struggle to generalize across anatomical contexts. We introduce D$^2$4FAD, a novel dual distillation framework for few-shot anomaly detection that identifies anomalies in previously unseen tasks using only a small number of normal reference images. Our approach leverages a pre-trained encoder as a teacher network to extract multi-scale features from both support and query images, while a student decoder learns to distill knowledge from the teacher on query images and self-distill on support images. Code/project link: https://github.com/ttttqz/D24FAD

论文ICLR 2026 Poster2026 年医学影像

Disco:通过邻接感知协同着色实现密集重叠细胞实例分割

ICLR 2026 Poster accepted paper at ICLR 2026. Accurate cell instance segmentation is foundational for digital pathology analysis. Existing methods based on contour detection and distance mapping still face significant challenges in processing complex and dense cellular regions. Graph coloring-based methods provide a new paradigm for this task, yet the effectiveness of this paradigm in real-world scenarios with dense overlaps and complex topologies has not been verified. Addressing this issue, we release a large-scale dataset GBC-FS 2025, which contains highly complex and dense sub-cellular nuclear arrangements. We conduct the first systematic analysis of the chromatic properties of cell adjacency graphs across four diverse datasets and reveal an important discovery: most real-world cell graphs are non-bipartite, with a high prevalence of odd-length cycles (predominantly triangles).

论文ICLR 2026 Poster2026 年clinical prediction

基于多变量并行注意力生成神经元活动的基础模型

ICLR 2026 Poster accepted paper at ICLR 2026. Learning from multi-variate time-series with heterogeneous channel configurations remains a fundamental challenge for deep neural networks, particularly in clinical domains such as intracranial electroencephalography (iEEG), where channel setups vary widely across subjects. In this work, we introduce multi-variate parallel attention (MVPA), a novel self-attention mechanism that disentangles content, temporal, and spatial attention, enabling flexible, generalizable, and efficient modeling of time-series data with varying channel counts and configurations. We use MVPA to build MVPFormer, a generative foundation model for human electrophysiology, trained to predict the evolution of iEEG signals across diverse subjects. To support this and future efforts by the community, we release the SWEC iEEG dataset, the largest publicly available iEEG dataset to date, comprising nearly 10,000 hours of recordings from heterogeneous clinical sources. Code/project link: https://github.com/IBM/multi-variate-parallel-transformer; https://huggingface.co/datasets/NeuroTec/SWEC_iEEG_Dataset

论文ICLR 2026 Poster2026 年clinical NLP

LLM 推理中类人谬误模式的理论扎根评测

ICLR 2026 Poster accepted paper at ICLR 2026. We study logical reasoning in language models by asking whether their errors follow established human fallacy patterns. Using the Erotetic Theory of Reasoning (ETR) and its open‑source implementation, PyETR, we programmatically generate 383 formally specified reasoning problems and evaluate 38 models. For each response, we judge logical correctness and, when incorrect, whether it matches an ETR‑predicted fallacy. Two results stand out: (i) as a capability proxy (Chatbot Arena Elo) increases, a larger share of a model’s incorrect answers are ETR‑predicted fallacies ($\rho=0.360, p=0.0265$), while overall correctness on this dataset shows no correlation with capability; (ii) reversing premise order significantly reduces fallacy production for many models, mirroring human order effects.

论文ICLR 2026 Poster2026 年trustworthy medical AI

序贯信息瓶颈融合:迈向鲁棒且可泛化的多模态脑肿瘤分割

ICLR 2026 Poster accepted paper at ICLR 2026. Brain tumor segmentation in multi-modal MRIs poses significant challenges when one or more modalities are missing. Recent approaches commonly employ parallel fusion strategies; however, these methods often risk losing crucial shared information across modalities, which can degrade segmentation performance. In this paper, we advocate leveraging sequential information bottleneck fusion to effectively preserve shared information across modalities. From an information-theoretic perspective, sequential fusion not only produces more robust fused representations in missing-data scenarios but also achieves a tighter generalization upper bound compared to parallel fusion approaches.

论文ICLR 2026 Poster2026 年医学影像

脑图基础模型:跨多图谱与疾病的预训练和提示微调

ICLR 2026 Poster accepted paper at ICLR 2026. As large language models (LLMs) continue to revolutionize AI research, there is a growing interest in building large-scale brain foundation models to advance neuroscience. While most existing brain foundation models are pre-trained on time-series signals or connectome features, we propose a novel graph-based pre-training paradigm for constructing a brain graph foundation model. In this paper, we introduce the Brain Graph Foundation Model, termed BrainGFM, a unified framework that leverages graph contrastive learning and graph masked autoencoders for large-scale fMRI-based pre-training. BrainGFM is pre-trained on a diverse mixture of brain atlases with varying parcellations, significantly expanding the pre-training corpus and enhancing the model’s ability to generalize across heterogeneous fMRI-derived brain representations. Code/project link: https://github.com/weixinxu666/BrainGFM

论文ICLR 2026 Poster2026 年trustworthy medical AI

UltraGauss:3D 超声体数据的超快速 Gaussian 重建

ICLR 2026 Poster accepted paper at ICLR 2026. Ultrasound imaging is widely used due to its safety, affordability, and real-time capabilities, but its 2D interpretation is highly operator-dependent, leading to variability and increased cognitive demand. We present $\textbf{UltraGauss}$: an ultrasound-specific Gaussian Splatting framework that serves as an efficient approximation to acoustic image formation. Unlike projection-based splatting, UltraGauss renders by $\textit{probe-plane intersection}$ with in-plane aggregation, aligning with plane-based echo sampling while remaining fast and memory-efficient. A stable parameterisation and compute-aware GPU rasterisation make this method practical at scale. Code/project link: https://www.robots.ox.ac.uk/~vgg/research/UltraGauss/

论文ICLR 2026 Poster2026 年clinical prediction

基于脉冲的数字大脑:脑活动分析的新型基础模型

ICLR 2026 Poster accepted paper at ICLR 2026. Modeling the temporal dynamics of the human brain remains a core challenge in computational neuroscience and artificial intelligence. Traditional methods often ignore the biological spike characteristics of brain activity and find it difficult to reveal the dynamic dependencies and causal interactions between brain regions, limiting their effectiveness in brain function research and clinical applications. To address this issue, we propose a Spike-based Digital Brain (Spike-DB), a novel fundamental model that introduces the spike computing paradigm into brain time series modeling. Spike-DB encodes fMRI signals as spike trains and learns the temporal driving relationships between anchor and target regions to achieve high-precision prediction of brain activity and reveal underlying causal dependencies and dynamic relationship characteristics. Code/project link: https://github.com/UAIBC-Brain/Spike-DB

论文ICLR 2026 Poster2026 年trustworthy medical AI

Dual-Kernel Adapter:拓展数据受限医学图像分析的空间视野

ICLR 2026 Poster accepted paper at ICLR 2026. Adapters have become a widely adopted strategy for efficient fine-tuning of foundation models, particularly in resource-constrained settings. However, their performance under extreme data scarcity—common in medical imaging due to high annotation costs, privacy regulations, and fragmented datasets—remains underexplored. In this work, we present the first comprehensive study of adapter-based fine-tuning for vision foundation models in low-data medical imaging scenarios. We find that, contrary to their promise, conventional Adapters can degrade performance under severe data constraints, performing even worse than simple linear probing when trained on less than 1\% of the corresponding training data.

论文ICLR 2026 Poster2026 年医学影像

统一脑表面与脑体积配准

ICLR 2026 Poster accepted paper at ICLR 2026. Accurate registration of brain MRI scans is fundamental for cross-subject analysis in neuroscientific studies. This involves aligning both the cortical surface of the brain and the interior volume. Traditional methods treat volumetric and surface-based registration separately, which often leads to inconsistencies that limit downstream analyses. We propose a deep learning framework, UCS, that registers 3D brain MRI images by jointly aligning both cortical and subcortical regions, through a unified volume-and-surface-based representation. Our approach leverages an intermediate spherical coordinate space to bridge anatomical surface topology with volumetric anatomy, enabling consistent and anatomically accurate alignment.

论文ICLR 2026 Poster2026 年clinical prediction

SurvHTE-Bench:生存分析中异质治疗效应估计基准

ICLR 2026 Poster accepted paper at ICLR 2026. Estimating heterogeneous treatment effects (HTEs) from right-censored survival data is critical in high-stakes applications such as precision medicine and individualized policy-making. Yet, the survival analysis setting poses unique challenges for HTE estimation due to censoring, unobserved counterfactuals, and complex identification assumptions. Despite recent advances, from causal survival forests to survival meta-learners and outcome imputation approaches, evaluation practices remain fragmented and inconsistent. We introduce SurvHTE‐Bench, the first comprehensive benchmark for HTE estimation with censored outcomes. The benchmark spans (i) a modular suite of synthetic datasets with known ground truth, systematically varying causal assumptions and survival dynamics, (ii) semi-synthetic datasets that pair real-world covariates with simulated treatments and outcomes, and (iii) real-world datasets from a twin study (with known ground truth) and from an HIV clinical trial.

论文ICLR 2026 Poster2026 年trustworthy medical AI

SE-Diff:面向综合 ECG 生成的模拟器与经验增强扩散模型

ICLR 2026 Poster accepted paper at ICLR 2026. Cardiovascular disease (CVD) is a leading cause of mortality worldwide. Electrocardiograms (ECGs) are the most widely used non-invasive tool for cardiac assessment, yet large, well-annotated ECG corpora are scarce due to cost, privacy, and workflow constraints. Generating ECGs can aid mechanistic understanding of cardiac electrical activity, enable the construction of large, heterogeneous, and unbiased datasets, and facilitate privacy-preserving data sharing. Generating realistic ECG signals from clinical context is important yet underexplored. Recent work has leveraged diffusion models for text-to-ECG generation, but two challenges remain: (i) existing methods often overlook physiological simulator knowledge of cardiac activity; and (ii) they ignore broader, experience-based clinical knowledge grounded in real-world practice.

论文ICLR 2026 Poster2026 年clinical prediction

MRI 运动校正的可靠评测:数据集与洞见

ICLR 2026 Poster accepted paper at ICLR 2026. Correcting motion artifacts in scientific and medical imaging is important, as they significantly impact image quality. However, evaluating deep learning-based and classical motion correction methods remains fundamentally difficult due to the lack of accessible ground-truth target data. To address this challenge, we study three evaluation approaches: real-world evaluation based on reference scans, simulated motion, and reference-free evaluation, each with its merits and shortcomings. To enable evaluation with real-world motion artifacts, we release PMoC3D, a dataset consisting of unprocessed $\textbf{P}$aired $\textbf{Mo}$tion-$\textbf{C}$orrupted $\textbf{3D}$ brain MRI data.

论文ICLR 2026 Poster2026 年trustworthy medical AI

ProstaTD:将手术 triplet 从分类桥接到全监督检测

ICLR 2026 Poster accepted paper at ICLR 2026. Surgical triplet detection is a critical task in surgical video analysis, with significant implications for performance assessment and training novice surgeons. However, existing datasets like CholecT50 lack precise spatial bounding box annotations, rendering triplet classification at the image level insufficient for practical applications. The inclusion of bounding box annotations is essential to make this task meaningful, as they provide the spatial context necessary for accurate analysis and improved model generalizability. To address these shortcomings, we introduce ProstaTD, a large-scale, multi-institutional dataset for surgical triplet detection, developed from the technically demanding domain of robot-assisted prostatectomy.

论文ICLR 2026 Poster2026 年trustworthy medical AI

NAB:稀疏视角 CT 重建的神经自适应分箱

ICLR 2026 Poster accepted paper at ICLR 2026. Computed Tomography (CT) plays a vital role in inspecting the internal structures of industrial objects. Furthermore, achieving high-quality CT reconstruction from sparse views is essential for reducing production costs. While classic implicit neural networks have shown promising results for sparse reconstruction, they are unable to leverage shape priors of objects. Motivated by the observation that numerous industrial objects exhibit rectangular structures, we propose a novel \textbf{N}eural \textbf{A}daptive \textbf{B}inning (\textbf{NAB}) method that effectively integrates rectangular priors into the reconstruction process. Code/project link: https://github.com/Wangduo-Xie/NAB_CT_reconstruction

论文ICLR 2026 Poster2026 年clinical prediction

MedAraBench:大规模阿拉伯语医学问答数据集与基准

ICLR 2026 Poster accepted paper at ICLR 2026. Arabic remains one of the most underrepresented languages in natural language processing research, particularly in medical applications, due to the limited availability of open-source data and benchmarks. The lack of resources hinders efforts to evaluate and advance the multilingual capabilities of Large Language Models (LLMs). In this paper, we introduce MedAraBench, a large-scale dataset consisting of Arabic multiple-choice question-answer pairs across various medical specialties. We constructed the dataset by manually digitizing a large repository of academic materials created by medical professionals in the Arabic-speaking region.

论文ICLR 2026 Poster2026 年trustworthy medical AI

基于持续 Fiedler 向量图模型的医疗保险欺诈检测

ICLR 2026 Poster accepted paper at ICLR 2026. Healthcare insurance fraud detection presents unique machine learning challenges: labeled data are scarce due to delayed verification processes, and fraudulent behaviors evolve rapidly, often manifesting in complex, graph-structured interactions. Existing methods struggle in such settings. Pretraining routines typically overlook structural anomalies under limited supervision, while online models often fail to adapt to changing fraud patterns without labeled updates. To address these issues, we propose the Continual Fiedler Vector Graph model (ConFVG), a fraud detection framework designed for label-scarce and non-stationary environments.

论文ICLR 2026 Poster2026 年trustworthy medical AI

通过上下文-细节交互自适应门增强医疗时间序列稀疏事件检测

ICLR 2026 Poster accepted paper at ICLR 2026. Accurate detection of clinically meaningful events in healthcare time-series data is crucial for reliable downstream analysis and decision support. However, most existing methods struggle to jointly localize event boundaries and classify event types; even detection transformer (DETR)-based approaches show limited performance when confronted with extremely sparse events typical of clinical recordings. To address these challenges, we propose a coarse-to-fine detection framework combining a global context explorer, a local detail inspector, and an adaptive gating module (AGM) that fuses multiple label perspectives. The AGM uses transformed labels—encoding event presence and temporal position—to improve learning on sparse events.

论文ICLR 2026 Poster2026 年clinical prediction

DM4CT:计算机断层重建扩散模型基准

ICLR 2026 Poster accepted paper at ICLR 2026. Diffusion models have recently emerged as powerful priors for solving inverse problems. While Computed Tomography (CT) is theoretically a linear inverse problem, it poses many practical challenges. These include correlated noise, artifact structures, reliance on system geometry, and misaligned value ranges, which make the direct application of diffusion models more difficult than in domains like natural image generation. To systematically evaluate how diffusion models perform in this context and compare them with established reconstruction methods, we introduce DM4CT, a comprehensive benchmark for CT reconstruction. Code/project link: https://github.com/DM4CT/DM4CT

论文ICLR 2026 Poster2026 年clinical prediction

拼合心智马赛克:迈向 EEG 语义意图解码

ICLR 2026 Poster accepted paper at ICLR 2026. Enabling natural communication through brain–computer interfaces (BCIs) remains one of the most profound challenges in neuroscience and neurotechnology. While existing frameworks offer partial solutions, they are constrained by oversimplified semantic representations and a lack of interpretability. To overcome these limitations, we introduce **Semantic Intent Decoding(SID)**, a novel framework that translates neural activity into natural language by modeling meaning as a flexible set of compositional semantic units. SID is built on three core principles: semantic compositionality, continuity and expandability of semantic space, and fidelity in reconstruction.

论文ICLR 2026 Poster2026 年trustworthy medical AI

用谱熵正则重新思考医学图像分割中的模型校准

ICLR 2026 Poster accepted paper at ICLR 2026. Deep neural networks for medical image segmentation often produce overconfident predictions, posing clinical risks due to miscalibrated uncertainty estimates. In this work, we rethink model calibration from a frequency-domain perspective and identify two critical factors causing miscalibration: spectral bias, where models overemphasize low-frequency components, and confidence saturation, which suppresses overall power spectral density in confidence maps. To address these challenges, we propose a novel frequency-aware calibration framework integrating spectral entropy regularization and power spectral smoothing. The spectral entropy term promotes a balanced frequency spectrum and enhances overall spectral power, enabling better modeling of high-frequency boundary and low-frequency structural uncertainty.

论文ICLR 2026 Poster2026 年clinical prediction

医学 MLLM 如何失效?医学图像视觉定位研究

ICLR 2026 Poster accepted paper at ICLR 2026. Generalist multimodal large language models (MLLMs) have achieved impressive performance across a wide range of vision-language tasks. However, their performance on medical tasks—particularly in zero-shot settings where generalization is critical—remains suboptimal. A key research gap is the limited understanding of why medical MLLMs underperform in medical image interpretation. **In this work**, we present a pioneering systematic investigation into the visual grounding capabilities of state-of-the-art medical MLLMs. To disentangle *visual grounding* from *semantic grounding*, we design VGMED, a novel evaluation dataset developed with expert clinical guidance, explicitly assessing the visual grounding capability of medical MLLMs. Code/project link: https://guimeng-leo-liu.github.io/Medical-MLLMs-Fail/

论文ICLR 2026 Poster2026 年surgical/interventional AI

WavePolyp:基于层级小波特征聚合与帧间差异感知的视频息肉分割

ICLR 2026 Poster accepted paper at ICLR 2026. Automatic polyp segmentation from colonoscopy videos is a crucial technique that assists clinicians in improving the accuracy and efficiency of diagnosis, preventing polyps from developing into cancer. However, video polyp segmentation (VPS) is a challenging task due to (1) the significant inter-frame divergence in videos, (2) the high camouflage of polyps in normal colon structures and (3) the clinical requirement of real-time performance. In this paper, we propose a novel segmentation network, WavePolyp, which consists of two innovative components: a hierarchical wavelet-based feature aggregation (HWFA) module and inter-frame divergence perception (IDP) blocks. Specifically, HWFA excavates and amplifies discriminative information from high-frequency and low-frequency features decomposed by wavelet transform, hierarchically aggregating them into refined spatial representations within each frame. Code/project link: https://github.com/FishballZhang/WavePolyp

论文ICLR 2026 Poster2026 年clinical NLP

迈向医学图像分割中的文本-掩膜一致性

ICLR 2026 Poster accepted paper at ICLR 2026. Vision-language models for medical image segmentation often produce masks that conflict with the accompanying text, especially under multi-site/multi-lesion descriptions. We trace this failure to two factors: (i) highly templated and repetitive clinical language causes one-to-one hard contrastive learning to yield numerous false negatives, weakening cross-modal alignment; and (ii) predominantly vision-driven, one-way cross-attention lacks a language-dominant, spatially aware pathway, hindering effective injection of textual semantics into the spatial visual domain. To this end, we propose Consistency-enhanced Two-stage Segmentation (C2Seg). In the pretraining stage, Cluster-aware Contrastive Learning uses a frozen strong baseline to construct an intra-batch text similarity matrix as soft labels, thereby alleviating false negative conflicts and producing more discriminative visual representations.

论文ICLR 2026 Poster2026 年trustworthy medical AI

用时频 motif 学习对单通道 EEG 进行 token 化

ICLR 2026 Poster accepted paper at ICLR 2026. Foundation models are reshaping EEG analysis, yet an important problem of EEG tokenization remains a challenge. This paper presents TFM-Tokenizer, a novel tokenization framework that learns a vocabulary of time-frequency motifs from *single-channel* EEG signals and encodes them into discrete tokens. We propose a dual-path architecture with time–frequency masking to capture robust motif representations, and it is model-agnostic, supporting both lightweight transformers and existing foundation models for downstream tasks. Our study demonstrates three key benefits: *Accuracy:* Experiments on four diverse EEG benchmarks demonstrate consistent performance gains across both single- and multi-dataset pretraining settings, achieving up to $11\%$ improvement in Cohen’s Kappa over strong baselines. Code/project link: https://github.com/Jathurshan0330/TFM-Tokenizer

论文ICLR 2026 Poster2026 年trustworthy medical AI

随机锚点与低秩去相关学习:类增量医学图像分类的极简流程

ICLR 2026 Poster accepted paper at ICLR 2026. Class-incremental learning (CIL) in medical image-guided diagnosis requires models to preserve knowledge of historical disease classes while adapting to emerging categories. Pre-trained models (PTMs) with well-generalized features provide a strong foundation, yet most PTM-based CIL strategies, such as prompt tuning, task-specific adapters and model mixtures, rely on increasingly complex designs. While effective in general-domain benchmarks, these methods falter in medical imaging, where low intra-class variability and high inter-domain shifts (from scanners, protocols and institutions) make CIL particularly prone to representation collapse and domain misalignment. Under such conditions, we find that lightweight representation calibration strategies, often dismissed in general-domain CIL for their modest gains, can be remarkably effective for adapting PTMs in medical settings.

论文ICLR 2026 Poster2026 年trustworthy medical AI

PathChat-SegR1:通过 SO-GRPO 实现病理推理分割

ICLR 2026 Poster accepted paper at ICLR 2026. Segmentation in pathology image requires handling out-of-domain tissue morphologies and new pathologies beyond training distributions, where traditional closed-set segmentation approaches fail to generalize. Reasoning segmentation enables zero-shot generalization via prompting with text queries. However, existing reasoning segmentation models face three barriers when applied to pathology: (1) the vision encoder lack pathology-specific knowledge and robustness to staining variations, (2) the large language model (LLM) backbone for reasoning fails to identify whether it has gathered sufficient semantic context to trigger the segmentation output, and (3) no reasoning segmentation benchmarks and datasets exist for pathology analysis. Consequently, we introduce PathChat-SegR1, a reasoning segmentation model built upon pathology-specific vision encoders trained with a novel stain-invariant self-distillation for robust pathology image representations.

论文ICLR 2026 Poster2026 年trustworthy medical AI

基于强化学习的假设驱动临床决策语言 Agent

ICLR 2026 Poster accepted paper at ICLR 2026. Clinical decision-making is a dynamic, interactive, and cyclic process where doctors have to repeatedly decide on which clinical action to perform and consider newly uncovered information for diagnosis and treatment. Large Language Models (LLMs) have the potential to support clinicians in this process, however, most applications of LLMs in clinical decision support suffer from one of two limitations: Either they assume the unrealistic scenario of immediate availability of all patient information and do not model the interactive and iterative investigation process, or they restrict themselves to the limited "out-of-the-box" capabilities of large pre-trained models without performing task-specific training. In contrast to this, we propose to model clinical decision-making for diagnosis with a hypothesis-driven uncertainty-aware language agent, LA-CDM, that converges towards a diagnosis via repeatedly requesting and interpreting relevant tests. Using a hybrid training paradigm combining supervised and reinforcement learning, we train LA-CDM with three objectives targeting critical aspects of clinical decision-making: accurate hypothesis generation, hypothesis uncertainty estimation, and efficient decision-making. Code/project link: https://github.com/dharouni/LA-CDM

论文ICLR 2026 Poster2026 年trustworthy medical AI

单模态基础模型的联合适配用于多模态阿尔茨海默病诊断

ICLR 2026 Poster accepted paper at ICLR 2026. Alzheimer’s Disease (AD) is a progressive neurodegenerative disorder and a leading cause of dementia worldwide. Accurate diagnosis requires integrating diverse patient data modalities. With the rapid advancement of foundation models in neurobiology and medicine, integrating foundation models from various modalities has emerged as a promising yet underexplored direction for multi-modal AD diagnosis. A central challenge is enabling effective interaction among these models without disrupting the robust, modality-specific representations learned from large-scale pretraining. To address this, we propose a novel multi-modal framework for AD diagnosis that enables joint interaction among uni-modal foundation models through modality-anchored interaction.

论文ICLR 2026 Poster2026 年trustworthy medical AI

GARLIC:ICU 多变量时间序列的图注意力关系学习

ICLR 2026 Poster accepted paper at ICLR 2026. Healthcare data, such as Intensive Care Unit (ICU) records, comprise heterogeneous multivariate time series sampled at irregular intervals with pervasive missingness. However, clinical applications demand predictive models that are both accurate and interpretable. We present our Graph Attention-based Relational Learning for Intensive Care (GARLIC) model, a novel neural network architecture that imputes missing data through a learnable exponential-decay encoder, captures inter-sensor dependencies via time-lagged summary graphs, and fuses global patterns with cross-dimensional sequential attention. All attention weights and graph edges are learned end-to-end to serve as built-in observation-, signal-, and edge-level explanations.

论文ICLR 2026 Poster2026 年trustworthy medical AI

Critic-Adviser-Reviser 循环精炼:迈向高质量 EMR 语料生成

ICLR 2026 Poster accepted paper at ICLR 2026. Electronic medical records (EMRs) are vital for healthcare research, but their use is limited by privacy concerns. Synthetic EMR generation offers a promising alternative, yet most existing methods merely imitate real records without adhering to rigorous clinical quality principles. To address this, we introduce LLM-CARe, a stage-wise cyclic refinement framework that progressively improves EMR quality through three stages, each targeting a specific granularity: corpus, section and document. At each stage, a Critic, an Adviser, and a Reviser collaborate iteratively to evaluate, provide feedback, and refine the drafts.

论文ICLR 2026 Poster2026 年surgical/interventional AI

生物与临床轨迹的可控序列编辑

ICLR 2026 Poster accepted paper at ICLR 2026. Conditional generation models for longitudinal sequences can produce new or modified trajectories given a conditioning input. However, they often lack control over when the condition should take effect (timing) and which variables it should influence (scope). Most methods either operate only on univariate sequences or assume that the condition alters all variables and time steps. In scientific and clinical settings, interventions instead begin at a specific moment, such as the time of drug administration or surgery, and influence only a subset of measurements while the rest of the trajectory remains unchanged.

论文ICLR 2026 Poster2026 年clinical NLP

通过多粒度语言学习增强医学视觉理解

ICLR 2026 Poster accepted paper at ICLR 2026. Recent advances in image-text pretraining have significantly enhanced visual understanding by aligning visual and textual representations. Contrastive Language-Image Pretraining (CLIP) has played a pivotal role in multimodal learning. However, its focus on single-label, single-granularity alignment limits its effectiveness in complex domains such as medical imaging, where images often correspond to multiple labels across different levels of granularity. To address this, we propose Multi-Granular Language Learning (MGLL), a contrastive learning framework designed to improve both multi-label and cross-granularity alignment. Code/project link: https://github.com/HUANGLIZI/MGLL

论文ICLR 2026 Oral2026 年clinical prediction

BioX-Bridge:跨生物信号的无监督跨模态知识迁移模型桥接

ICLR 2026 Oral accepted paper at ICLR 2026. Biosignals offer valuable insights into the physiological states of the human body. Although biosignal modalities differ in functionality, signal fidelity, sensor comfort, and cost, they are often intercorrelated, reflecting the holistic and interconnected nature of human physiology. This opens up the possibility of performing the same tasks using alternative biosignal modalities, thereby improving the accessibility, usability, and adaptability of health monitoring systems. However, the limited availability of large labeled datasets presents challenges for training models tailored to specific tasks and modalities of interest.

论文ICLR 2026 Poster2026 年clinical prediction

重用基础模型实现可泛化医学时间序列分类

ICLR 2026 Poster accepted paper at ICLR 2026. Medical time series (MedTS) classification suffers from poor generalizability in real-world deployment due to inter- and intra-dataset heterogeneity, such as varying numbers of channels, signal lengths, task definitions, and patient characteristics. % implicit patient characteristics, variable channel configurations, time series lengths, and diagnostic tasks. To address this, we propose FORMED, a novel framework for repurposing a backbone foundation model, pre-trained on generic time series, to enable highly generalizable MedTS classification on unseen datasets. FORMED combines the backbone with a novel classifier comprising two components: (1) task-specific channel embeddings and label queries, dynamically sized to match any number of channels and target classes, and (2) a shared decoding attention layer, jointly trained across datasets to capture medical domain knowledge through task-agnostic feature-query interactions.

论文ICLR 2026 Poster2026 年trustworthy medical AI

Cross-Timestep:用于医学分割的跨时序记忆 LSTM 与自适应先验解码 3D 扩散模型

ICLR 2026 Poster accepted paper at ICLR 2026. Diffusion models have recently demonstrated significant robustness in medical image segmentation, effectively accommodating variations across different imaging styles. However, their applications remain limited due to: (i) current successes being primarily confined to 2D segmentation tasks—we observe that diffusion models tend to collapse at the early stage when applied to 3D medical tasks; and (ii) the inherently isolated iteration along timesteps during training and inference. To tackle these limitations, we propose a novel framework named Cross-Timestep, which incorporates two key innovations: an Adaptive Priori Decoding Strategy (APDS) and a trans-temporal memory LSTM (tLSTM) mechanism. (i) The APDS provides prior guidance during the diffusion process by employing a Priori Decoder(PD) that focuses solely on the conditional branch, successfully stabilizing the reverse diffusion process.

论文ICLR 2026 Oral2026 年clinical prediction

CounselBench:心理健康问答中大语言模型的大规模专家评测与对抗基准

ICLR 2026 Oral accepted paper at ICLR 2026. Medical question answering (QA) benchmarks often focus on multiple-choice or fact-based tasks, leaving open-ended answers to real patient questions underexplored. This gap is particularly critical in mental health, where patient questions often mix symptoms, treatment concerns, and emotional needs, requiring answers that balance clinical caution with contextual sensitivity. We present CounselBench, a large-scale benchmark developed with 100 mental health professionals to evaluate and stress-test large language models (LLMs) in realistic help-seeking scenarios. The first component, CounselBench-EVAL, contains 2,000 expert evaluations of answers from GPT-4, LLaMA 3, Gemini, and online human therapists on patient questions from the public forum CounselChat.

论文ICLR 2026 Poster2026 年clinical prediction

CerebraGloss:面向细粒度临床 EEG 解读的大型视觉语言模型指令微调

ICLR 2026 Poster accepted paper at ICLR 2026. Interpreting clinical electroencephalography (EEG) is a laborious, subjective process, and existing computational models are limited to narrow classification tasks rather than holistic interpretation. A key bottleneck for applying powerful Large Vision-Language Models (LVLMs) to this domain is the scarcity of datasets pairing EEG visualizations with fine-grained, expert-level annotations. We address this by introducing CerebraGloss, an instruction-tuned LVLM for nuanced EEG interpretation. We first introduce a novel, automated data generation pipeline, featuring a bespoke YOLO-based waveform detector, to programmatically create a large-scale corpus of EEG-text instruction data. Code/project link: https://github.com/iewug/CerebraGloss

论文ICLR 2026 Poster2026 年clinical prediction

CRONOS:4D 医学纵向序列的连续时间重建

ICLR 2026 Poster accepted paper at ICLR 2026. Forecasting how 3D medical scans evolve along time is important for disease progression, treatment planning, and developmental assessment. Yet existing models either rely on a single prior scan, fixed grid times, or target global labels, which limits voxel-level forecasting under irregular sampling. We present CRONOS, a unified framework for many-to-one prediction from multiple past scans that supports both discrete (grid-based) and continuous (real-valued) timestamps in one model, to the best of our knowledge the first to achieve continuous sequence-to-image forecasting for 3D medical data. CRONOS learns a spatio-temporal velocity field that transports context volumes toward a target volume at an arbitrary time, while operating directly in 3D voxel space.

论文ICLR 2026 Poster2026 年medical LLM agent

AnesSuite:面向 LLM 麻醉学推理的综合基准与数据集套件

ICLR 2026 Poster accepted paper at ICLR 2026. The application of large language models (LLMs) in the medical field has garnered significant attention, yet their reasoning capabilities in more specialized domains like anesthesiology remain underexplored. To bridge this gap, we introduce AnesSuite, the first comprehensive dataset suite specifically designed for anesthesiology reasoning in LLMs. The suite features AnesBench, an evaluation benchmark tailored to assess anesthesiology-related reasoning across three levels: factual retrieval (System 1), hybrid reasoning (System 1.x), and complex decision-making (System 2). Alongside this benchmark, the suite includes three training datasets that provide an infrastructure for continued pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning with verifiable rewards (RLVR). Code/project link: https://github.com/MiliLab/AnesSuite

论文ICLR 2026 Poster2026 年医学影像

MedGMAE:面向医学体数据表征学习的 Gaussian 掩码自编码器

ICLR 2026 Poster accepted paper at ICLR 2026. Self-supervised pre-training has emerged as a critical paradigm for learning transferable representations from unlabeled medical volumetric data. Masked autoencoder based methods have garnered significant attention, yet their application to volumetric medical image faces fundamental limitations from the discrete voxel-level reconstruction objective, which neglects comprehensive anatomical structure continuity. To address this challenge, We propose MedGMAE, a novel framework that replaces traditional voxel reconstruction with 3D Gaussian primitives reconstruction as new perspectives on representation learning. Our approach learns to predict complete sets of 3D Gaussian parameters as semantic abstractions to represent the entire 3D volume, from sparse visible image patches. Code/project link: https://github.com/windrise/MedGMAE; https://anonymous.4open.science/r/MedGMAE-EC8F/

论文ICLR 2026 Poster2026 年surgical/interventional AI

HFSTI-Net:视频息肉分割的层级频率-空间-时间交互

ICLR 2026 Poster accepted paper at ICLR 2026. Automatic video polyp segmentation (VPS) is crucial for preventing and treating colorectal cancer by ensuring accurate identification of polyps in colonoscopy examinations. However, its clinical application is hampered by two key challenges: shape collapse, which compromises structural integrity, and episodic amnesia, which causes instability in challenging video sequences. To address these challenges, we present a novel video segmentation network, \emph{HFSTI-Net}, which integrates global perception with spatiotemporal consistency in spatial, temporal, and frequency domains. Specifically, to address shape collapse under low contrast or visual ambiguity, we design a Hierarchical Frequency-spatial Interaction (HFSI) module that fuses spatial and frequency cues for fine-grained boundary localization. Code/project link: https://github.com/Yuanqin-He/HFSTI-Net

论文ICLR 2026 Poster2026 年clinical prediction

泛癌筛查中的扫视-聚焦强化机制

ICLR 2026 Poster accepted paper at ICLR 2026. Pan-cancer screening in large-scale CT scans remains challenging for existing AI methods, primarily due to the difficulty of localizing diverse types of tiny lesions in large CT volumes. The extreme foreground-background imbalance significantly hinders models from focusing on diseased regions, while redundant focus on healthy regions not only decreases the efficiency but also increases false positives. Inspired by radiologists' glance and focus diagnostic strategy, we introduce GF-Screen, a Glance and Focus reinforcement learning framework for pan-cancer screening. GF-Screen employs a Glance model to localize the diseased regions and a Focus model to precisely segment the lesions, where segmentation results of the Focus model are leveraged to reward the Glance model via Reinforcement Learning (RL). Code/project link: https://github.com/Luffy03/GF-Screen

论文ICLR 2026 Poster2026 年trustworthy medical AI

超越聚合:在异质联邦学习中引导客户端

ICLR 2026 Poster accepted paper at ICLR 2026. Federated learning (FL) is increasingly adopted in domains like healthcare, where data privacy is paramount. A fundamental challenge in these systems is statistical heterogeneity—the fact that data distributions vary significantly across clients (e.g., different hospitals may treat distinct patient demographics). While current FL algorithms focus on aggregating model updates from these heterogeneous clients, the potential of the central server remains under-explored. This paper is motivated by a healthcare scenario: could a central server not only coordinate model training but also guide a new patient to the hospital best equipped for their specific condition?

论文ICLR 2026 Poster2026 年trustworthy medical AI

ECG 基础模型基准:跨临床任务的现实检验

ICLR 2026 Poster accepted paper at ICLR 2026. The 12-lead electrocardiogram (ECG) is a long-standing diagnostic tool. Yet machine learning for ECG interpretation remains fragmented, often limited to narrow tasks or datasets. FMs promise broader adaptability, but fundamental questions remain: Which architectures generalize best? How do models scale with limited labels? What explains performance differences across model families? We benchmarked eight ECG FMs on 26 clinically relevant tasks using 12 public datasets comprising 1,650 regression and classification targets. Models were evaluated under fine-tuning and frozen settings, with scaling analyses across dataset sizes.

论文ICLR 2026 Poster2026 年clinical NLP

用于胸部 X 光图像的结构化、标注式、定位化 VQA 数据集:含完整句答案与场景图

ICLR 2026 Poster accepted paper at ICLR 2026. Visual Question Answering (VQA) enables targeted and context-dependent analysis of medical images, such as chest X-rays (CXRs). However, existing VQA datasets for CXRs are typically constrained by simplistic and brief answer formats, lacking localization annotations (e.g., bounding boxes) and structured tags (e.g., region or radiological finding/disease tags). To address these limitations, we introduce MIMIC-Ext-CXR-QBA (abbr. CXR-QBA), a large-scale CXR VQA dataset derived from MIMIC-CXR, comprising 42 million QA-pairs with multi-granular, multi-part answers, detailed bounding boxes, and structured tags. Code/project link: https://github.com/philip-mueller/mimic-ext-cxr-qba/

论文ICLR 2026 Poster2026 年clinical NLP

重新思考放射报告生成:从叙事流到主题引导 findings

ICLR 2026 Poster accepted paper at ICLR 2026. Vision-Language Models (VLMs) for radiology report generation are typically trained to mimic the narrative flow of human experts. However, we identify a potential limitation in this conventional paradigm. We hypothesize that optimizing for narrative coherence encourages models to rely on linguistic priors and inter-sentence correlations, which can weaken their grounding in direct visual evidence and lead to factual inaccuracies. To investigate this, we design a controlled experiment demonstrating that as textual context increases, a model's reliance on the input image systematically decays. We propose LLaVA-TA (Topic-guided and Anatomy-aware), a new fine-tuning framework that directly addresses this challenge by re-engineering the generation process.

论文ICLR 2026 Poster2026 年clinical prediction

M3CoTBench:医学图像理解中 MLLM 思维链基准

ICLR 2026 Poster accepted paper at ICLR 2026. Chain-of-Thought (CoT) reasoning has proven effective in enhancing large language models by encouraging step-by-step intermediate reasoning, and recent advances have extended this paradigm to Multimodal Large Language Models (MLLMs). In the medical domain, where diagnostic decisions depend on nuanced visual cues and sequential reasoning, CoT aligns naturally with clinical thinking processes. However, current benchmarks for medical image understanding generally focus on the final answer while ignoring the reasoning path. An opaque process lacks reliable bases for judgment, making it difficult to assist doctors in diagnosis.

论文ICLR 2026 Poster2026 年medical LLM agent

K-Prism:知识引导与提示融合的通用医学图像分割模型

ICLR 2026 Poster accepted paper at ICLR 2026. Medical image segmentation is fundamental to clinical decision-making, yet existing models remain fragmented. They are usually trained on single knowledge sources and specific to individual tasks, modalities, or organs. This fragmentation contrasts sharply with clinical practice, where experts seamlessly integrate diverse knowledge: anatomical priors from training, exemplar-based reasoning from reference cases, and iterative refinement through real-time interaction. We present $\textbf{K-Prism}$, a unified segmentation framework that mirrors this clinical flexibility by systematically integrating three knowledge paradigms: (i) $\textit{semantic priors}$ learned from annotated datasets, (ii) $\textit{in-context knowledge}$ from few-shot reference examples, and (iii) $\textit{interactive feedback}$ from user inputs like clicks or scribbles. Code/project link: https://github.com/bangwayne/K-Prism

论文ICLR 2026 Poster2026 年clinical prediction

面向数据高效精准肿瘤学的病理组学多模态结构表征学习

ICLR 2026 Poster accepted paper at ICLR 2026. Fusing histopathology images and genomics data with deep learning has significantly advanced precision oncology. However, genomics data is often missing due to its high acquisition cost and complexity in real-world clinical scenarios. Existing solutions aim to reconstruct genomics data from histopathology images. Nevertheless, these methods typically relied only on individual case and overlooked the potential relationships among cases. Additionally, they failed to take advantage of the authentic genomics data of diagnostically related cases that are accessible from training for inference. In this work, we propose a novel Multi-modal Structural Representation Learning (MSRL) framework for data-efficient precision oncology. Code/project link: https://github.com/WkEEn/MSRL

论文ICLR 2026 Poster2026 年clinical prediction

Pixel-Level Residual Diffusion Transformer:可扩展 3D CT 体数据生成

ICLR 2026 Poster accepted paper at ICLR 2026. Generating high-resolution 3D CT volumes with fine details remains challenging due to substantial computational demands and optimization difficulties inherent to existing generative models. In this paper, we propose the Pixel-Level Residual Diffusion Transformer (PRDiT), a scalable generative framework that synthesizes high-quality 3D medical volumes directly at voxel-level. PRDiT introduces a two-stage training architecture comprising 1) a local denoiser in the form of an MLP-based blind estimator operating on overlapping 3D patches to separate low-frequency structures efficiently, and 2) a global residual diffusion transformer employing memory-efficient attention to model and refine high-frequency residuals across entire volumes. This coarse-to-fine modeling strategy simplifies optimization, enhances training stability, and effectively preserves subtle structures without the limitations of an autoencoder bottleneck.

论文ICLR 2026 Poster2026 年医学影像

建模像素级自监督嵌入密度用于医学 CT 无监督病理分割

ICLR 2026 Poster accepted paper at ICLR 2026. Accurate detection of all pathological findings in 3D medical images remains a significant challenge, as supervised models are limited to detecting only the few pathology classes annotated in existing datasets. To address this, we frame pathology detection as an unsupervised visual anomaly segmentation (UVAS) problem, leveraging the inherent rarity of pathological patterns compared to healthy ones. We enhance the existing density-based UVAS framework with two key innovations: (1) dense self-supervised learning for feature extraction, eliminating the need for supervised pretraining, and (2) learned, masking-invariant dense features as conditioning variables, replacing hand-crafted positional encodings. Trained on over 30,000 unlabeled 3D CT volumes, our fully self-supervised model, Screener, outperforms existing UVAS methods on four large-scale test datasets comprising 1,820 scans with diverse pathologies. Code/project link: https://github.com/mishgon/screener; https://anonymous.4open.science/r/screener-35EE/

论文ICLR 2026 Poster2026 年trustworthy medical AI

NurValues:临床情境中大语言模型的真实护理价值观评测

ICLR 2026 Poster accepted paper at ICLR 2026. While LLMs have demonstrated medical knowledge and conversational ability, their deployment in clinical practice raises new risks: patients may place greater trust in LLM-generated responses than in nurses' professional judgments, potentially intensifying nurse–patient conflicts. Such risks highlight the urgent need of evaluating whether LLMs align with the core nursing values upheld by human nurses. This work introduces the first benchmark for nursing value alignment, consisting of five core value dimensions distilled from international nursing codes: _Altruism_, _Human Dignity_, _Integrity_, _Justice_, and _Professionalism_. We define two-level tasks on the benchmark, considering the two characteristics of emerging nurse–patient conflicts.

论文ICLR 2026 Poster2026 年clinical prediction

FETAL-GAUGE:评估胎儿超声视觉语言模型的基准

ICLR 2026 Poster accepted paper at ICLR 2026. The growing demand for prenatal ultrasound imaging has intensified a global shortage of trained sonographers, creating barriers to essential fetal health monitoring. Deep learning has the potential to enhance sonographers' efficiency and support the training of new practitioners. Vision-Language Models (VLMs) are particularly promising for ultrasound interpretation, as they can jointly process images and text to perform multiple clinical tasks within a single framework. However, despite the expansion of VLMs, no standardized benchmark exists to evaluate their performance in fetal ultrasound imaging. Code/project link: https://github.com/BioMedIA-MBZUAI/FETAL-GAUGE

论文ICLR 2026 Poster2026 年clinical NLP

多图像医学思维

ICLR 2026 Poster accepted paper at ICLR 2026. Large language models perform well on many medical QA benchmarks, but real clinical reasoning is harder because diagnosis often requires integrating evidence across multiple images rather than interpreting a single view. We introduce MedThinkVQA, an expert-annotated benchmark for thinking with multiple images, in which models must interpret each image, combine cross-view evidence, and solve diagnostic questions under intermediate supervision and step-level evaluation. The dataset contains 10,067 cases, including 720 test cases, with an average of 6.68 images per case, substantially denser than prior work (earlier maxima $\leq$ 1.43). On the test set, the best closed-source models, Claude-4.6-opus, Gemini-3-pro, and GPT-5.2-xhigh, achieve only 54.9%--57.2% accuracy, while smaller proprietary variants, GPT-5-mini/nano, drop to 39.7% and 30.8%.

论文ICLR 2026 Poster2026 年trustworthy medical AI

AttTok:将属性 token 与生成式预训练视觉语言模型结合用于医学图像理解

ICLR 2026 Poster accepted paper at ICLR 2026. Recent generative pre-trained vision–language (GPTv) models have achieved remarkable success in multi-modal understanding, inspiring their adaptation to medical imaging tasks such as disease diagnosis and visual question answering (VQA). However, current instruction-tuned GPTv models suffer from two key challenges: (1) medical attributes (e.g., disease names, severity grades) are encoded as plain text tokens, collapsing semantically distinct concepts into nearly identical textual sequences; and (2) inadequate textual supervision weakens visual representation learning, leading to severe inter-attribute confusion and misaligned vision–language embeddings. To address these limitations, we introduce attribute tokens (AttTok), a set of pre‑defined special tokens that uniquely encode clinical attributes (e.g., imaging modality, diagnosis, severity) within a structured token space. Complemented by attribute‑centric embedding books, AttTok serves as anchor points for aligning both visual and textual modalities into a shared, discriminative representation space.

论文ICLR 2026 Oral2026 年clinical prediction

去中心化注意力错失中心信号:重新思考医学时间序列 Transformer

ICLR 2026 Oral accepted paper at ICLR 2026. Accurate analysis of Medical time series (MedTS) data, such as Electroencephalography (EEG) and Electrocardiography (ECG), plays a pivotal role in healthcare applications, including the diagnosis of brain and heart diseases. MedTS data typically exhibits two critical patterns: **temporal dependencies** within individual channels and **channel dependencies** across multiple channels. While recent advances in deep learning have leveraged Transformer-based models to effectively capture temporal dependencies, they often struggle to model channel dependencies. This limitation stems from a structural mismatch: ***MedTS signals are inherently centralized, whereas the Transformer's attention is decentralized***, making it less effective at capturing global synchronization and unified waveform patterns. Code/project link: https://github.com/Levi-Ackman/TeCh

论文ICLR 2026 Poster2026 年trustworthy medical AI

Cancer-Myth:评估大语言模型回答含错误预设的患者问题

ICLR 2026 Poster accepted paper at ICLR 2026. Cancer patients are increasingly turning to large language models (LLMs) for medical information, making it critical to assess how well these models handle complex, personalized questions. However, current medical benchmarks focus on medical exams or consumer-searched questions and do not evaluate LLMs on real patient questions with patient details. In this paper, we first have three hematology-oncology physicians evaluate cancer-related questions drawn from real patients. While LLM responses are generally accurate, the models frequently fail to recognize or address false presuppositions} in the questions, posing risks to safe medical decision-making.

论文ICLR 2026 Poster2026 年clinical prediction

能否用 LLM 为临床时间序列数据生成可迁移表征?

ICLR 2026 Poster accepted paper at ICLR 2026. Recent advances in vision-language models (VLMs) have achieved remarkable performance on standard medical benchmarks, yet their true clinical reasoning ability remains unclear. Existing datasets predominantly emphasize classification accuracy, creating an evaluation illusion in which models appear proficient while still failing at high-stakes diagnostic reasoning. We introduce Neural-MedBench, a compact yet reasoning-intensive benchmark specifically designed to probe the limits of multimodal clinical reasoning in neurology. Neural-MedBench integrates multi-sequence MRI scans, structured electronic health records, and clinical notes, and encompasses three core task families: differential diagnosis, lesion recognition, and rationale generation. Code/project link: https://neuromedbench.github.io/

论文ICLR 2026 Poster2026 年trustworthy medical AI

超越分类准确率:Neural-MedBench 与深层推理基准的必要性

ICLR 2026 Poster accepted paper at ICLR 2026. Epilepsy affects over 50 million people worldwide, and one-third of patients suffer drug-resistant seizures where surgery offers the best chance of seizure freedom. Accurate localization of the epileptogenic zone (EZ) relies on intracranial EEG (iEEG). Clinical workflows, however, remain constrained by labor-intensive manual review. At the same time, existing data-driven approaches are typically developed on single-center datasets that are inconsistent in format and metadata, lack standardized benchmarks, and rarely release pathological event annotations, creating barriers to reproducibility, cross-center validation, and clinical relevance. Code/project link: https://omni-ieeg.github.io/omni-ieeg/; https://github.com/Omni-iEEG/Omni-iEEG

论文ICLR 2026 Poster2026 年医学影像

面向医学超声的解剖感知表征学习

ICLR 2026 Poster accepted paper at ICLR 2026. Diagnostic accuracy of ultrasound imaging is limited by qualitative variability and its reliance on the expertise of medical professionals. Such challenges increase demand for computer-aided diagnostic systems that enhance diagnostic accuracy and efficiency. However, the unique texture and structural attributes of ultrasound images, and the scarcity of large-scale ultrasound datasets hinder the effective application of conventional machine learning methodologies. To address the challenges, we propose Anatomy-aware Representation Learning (ARL), a novel self-supervised representation learning framework specifically designed for medical ultrasound imaging.

论文ICLR 2026 Poster2026 年trustworthy medical AI

超越医学考试:面向心理健康真实任务与模糊性的临床医生标注公平性数据集

ICLR 2026 Poster accepted paper at ICLR 2026. Current medical language model (LM) benchmarks often over-simplify the complexities of day-to-day clinical practice tasks and instead rely on evaluating LMs on multiple-choice board exam questions. In psychiatry especially, these challenges are worsened by fairness and bias issues, since models can be swayed by patient demographics even when those factors should not influence clinical decisions. Thus, we present an expert-created and annotated dataset spanning five critical domains of decision-making in mental healthcare: treatment, diagnosis, documentation, monitoring, and triage. This U.S. centric dataset — created without any LM assistance — is designed to capture the nuanced clinical reasoning and daily ambiguities mental health practitioners encounter, reflecting the inherent complexities of care delivery that are missing from existing datasets.

论文ICLR 2026 Poster2026 年clinical prediction

从病历到诊断对话:面向精神共病的临床扎根方法与数据集

ICLR 2026 Poster accepted paper at ICLR 2026. Psychiatric comorbidity is clinically significant yet challenging due to the complexity of multiple co-occurring disorders. To address this, we develop a novel approach integrating synthetic patient electronic medical record (EMR) construction and multi-agent diagnostic dialogue generation. We create 502 synthetic EMRs for common comorbid conditions using a pipeline that ensures clinical relevance and diversity. Our multi-agent framework transfers the clinical interview protocol into a hierarchical state machine and context tree, supporting over 130 diagnostic states while maintaining clinical standards.

论文ICLR 2026 Poster2026 年trustworthy medical AI

AbdCTBench:从腹部表面几何学习临床生物标志物表征

ICLR 2026 Poster accepted paper at ICLR 2026. Body composition analysis through CT and MRI imaging provides critical insights for cardio-metabolic health assessment but remains limited by accessibility barriers including radiation exposure, high costs, and infrastructure requirements. We present AbdCTBench, a large-scale dataset containing 23,506 CT-derived abdominal surface meshes from 18,719 patients, paired with 87 comorbidity labels, 31 specific diagnosis codes, and 16 CT-derived biomarkers. Our key insight is that external surface geometry is predictive of internal tissue composition, enabling accessible health screening through consumer devices. We establish comprehensive benchmarks across seven computer vision architectures (ResNet-18/34/50, DenseNet-121, EfficientNet-B0, ViT-Small, Swin Transformer-Base), demonstrating that models can learn robust surface-to-biomarker representations directly from 2D mesh projections. Code/project link: https://abdctbenchrepo.github.io/AbdCTBench/