AI4Meder 站内搜索

搜索医学 AI 论文与资源

按论文、数据资源、技术竞赛、投稿截止日期和课程资源检索社区内容，快速进入对应详情页。

25 条结果

输入关键词或点击标签，按论文、数据资源、竞赛截止日期、征稿与课程缩小范围。标签：Representation Learning

论文ICLR 2026 Poster2026 年医学影像

CARL：面向光谱图像分析的相机无关表征学习

ICLR 2026 Poster accepted paper at ICLR 2026. Spectral imaging offers promising applications across diverse domains, including medicine and urban scene understanding, and is already established as a critical modality in remote sensing. However, variability in channel dimensionality and captured wavelengths among spectral cameras impede the development of AI-driven methodologies, leading to camera-specific models with limited generalizability and inadequate cross-camera applicability. To address this bottleneck, we introduce CARL, a model for Camera-Agnostic Representation Learning across RGB, multispectral, and hyperspectral imaging modalities. To enable the conversion of a spectral image with any channel dimensionality to a camera-agnostic representation, we introduce a novel spectral encoder, featuring a self-attention-cross-attention mechanism, to distill salient spectral information into learned spectral representations. Code/project link: https://github.com/IMSY-DKFZ/CARL

医学影像计算论文 Representation Learning Self-Supervised Learning Spectral Imaging ICLR 2026 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

融合像素与基因：计算病理中的空间感知学习

ICLR 2026 Poster accepted paper at ICLR 2026. Recent years have witnessed remarkable progress in multimodal learning within computational pathology. Existing models primarily rely on vision and language modalities; however, language alone lacks molecular specificity and offers limited pathological supervision, leading to representational bottlenecks. In this paper, we propose STAMP, a Spatial Transcriptomics-Augmented Multimodal Pathology representation learning framework that integrates spatially-resolved gene expression profiles to enable molecule-guided joint embedding of pathology images and transcriptomic data. Our study shows that self-supervised, gene-guided training provides a robust and task-agnostic signal for learning pathology image representations. Code/project link: https://github.com/Hanminghao/STAMP

医学影像计算医疗多模态可信、安全、公平与隐私论文 Computational pathology Multimodal Learning 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

Brain-Semantoks：用自蒸馏基础模型学习脑动力学语义 token

ICLR 2026 Poster accepted paper at ICLR 2026. The development of foundation models for functional magnetic resonance imaging (fMRI) time series holds significant promise for predicting phenotypes related to disease and cognition. Current models, however, are often trained using a mask-and-reconstruct objective on small brain regions. This focus on low-level information leads to representations that are sensitive to noise and temporal fluctuations, necessitating extensive fine-tuning for downstream tasks. We introduce Brain-Semantoks, a self-supervised framework designed specifically to learn abstract representations of brain dynamics. Its architecture is built on two core innovations: a semantic tokenizer that aggregates noisy regional signals into robust tokens representing functional networks, and a self-distillation objective that enforces representational stability across time.

医学影像计算 EHR 与临床预测可信、安全、公平与隐私论文 neuroscience neuroimaging 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

用时频 motif 学习对单通道 EEG 进行 token 化

ICLR 2026 Poster accepted paper at ICLR 2026. Foundation models are reshaping EEG analysis, yet an important problem of EEG tokenization remains a challenge. This paper presents TFM-Tokenizer, a novel tokenization framework that learns a vocabulary of time-frequency motifs from *single-channel* EEG signals and encodes them into discrete tokens. We propose a dual-path architecture with time–frequency masking to capture robust motif representations, and it is model-agnostic, supporting both lightweight transformers and existing foundation models for downstream tasks. Our study demonstrates three key benefits: *Accuracy:* Experiments on four diverse EEG benchmarks demonstrate consistent performance gains across both single- and multi-dataset pretraining settings, achieving up to $11\%$ improvement in Cohen’s Kappa over strong baselines. Code/project link: https://github.com/Jathurshan0330/TFM-Tokenizer

医学影像计算 EHR 与临床预测可信、安全、公平与隐私论文 EEG Tokenization 查看论文详情

论文ICLR 2026 Poster2026 年clinical prediction

重用基础模型实现可泛化医学时间序列分类

ICLR 2026 Poster accepted paper at ICLR 2026. Medical time series (MedTS) classification suffers from poor generalizability in real-world deployment due to inter- and intra-dataset heterogeneity, such as varying numbers of channels, signal lengths, task definitions, and patient characteristics. % implicit patient characteristics, variable channel configurations, time series lengths, and diagnostic tasks. To address this, we propose FORMED, a novel framework for repurposing a backbone foundation model, pre-trained on generic time series, to enable highly generalizable MedTS classification on unseen datasets. FORMED combines the backbone with a novel classifier comprising two components: (1) task-specific channel embeddings and label queries, dynamically sized to match any number of channels and target classes, and (2) a shared decoding attention layer, jointly trained across datasets to capture medical domain knowledge through task-agnostic feature-query interactions.

医学影像计算 EHR 与临床预测论文 Medical Time Seris 分类 Time Series Foundation Model 查看论文详情

论文ICLR 2026 Poster2026 年医学影像

MedGMAE：面向医学体数据表征学习的 Gaussian 掩码自编码器

ICLR 2026 Poster accepted paper at ICLR 2026. Self-supervised pre-training has emerged as a critical paradigm for learning transferable representations from unlabeled medical volumetric data. Masked autoencoder based methods have garnered significant attention, yet their application to volumetric medical image faces fundamental limitations from the discrete voxel-level reconstruction objective, which neglects comprehensive anatomical structure continuity. To address this challenge, We propose MedGMAE, a novel framework that replaces traditional voxel reconstruction with 3D Gaussian primitives reconstruction as new perspectives on representation learning. Our approach learns to predict complete sets of 3D Gaussian parameters as semantic abstractions to represent the entire 3D volume, from sparse visible image patches. Code/project link: https://github.com/windrise/MedGMAE; https://anonymous.4open.science/r/MedGMAE-EC8F/

医学影像计算论文 3D Gaussian Representation Medical Imaging analysis Volumetric Representation Learning ICLR 2026 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

基于互信息正则的频率均衡视网膜表征学习

ICLR 2026 Poster accepted paper at ICLR 2026. We propose a frequency-oriented perspective on retinal representation learning by analyzing masked autoencoders (MAE) through the lens of spatial frequency. Our analysis shows that MAE favors low-frequency content while under-encoding diagnostically critical high-frequency structures in retinal images. Because retinal pathology often manifests in high-frequency detail, this bias limits diagnostic performance and motivates frequency-balanced representations. Within a mutual-information (MI) formulation of MAE, we introduce the Frequency-Balanced Retinal Masked Autoencoder (RetMAE), which augments the reconstruction objective with a MI regularizer that suppresses low-frequency redundancy and accentuates clinically salient high-frequency information.

医学影像计算可信、安全、公平与隐私论文 Masked Image Modeling Masked Autoencoders Representation Learning 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

ECG 基础模型基准：跨临床任务的现实检验

ICLR 2026 Poster accepted paper at ICLR 2026. The 12-lead electrocardiogram (ECG) is a long-standing diagnostic tool. Yet machine learning for ECG interpretation remains fragmented, often limited to narrow tasks or datasets. FMs promise broader adaptability, but fundamental questions remain: Which architectures generalize best? How do models scale with limited labels? What explains performance differences across model families? We benchmarked eight ECG FMs on 26 clinically relevant tasks using 12 public datasets comprising 1,650 regression and classification targets. Models were evaluated under fine-tuning and frozen settings, with scaling analyses across dataset sizes.

医学影像计算 EHR 与临床预测可信、安全、公平与隐私论文 ECG 心电基础模型查看论文详情

论文ICLR 2026 Poster2026 年clinical prediction

面向数据高效精准肿瘤学的病理组学多模态结构表征学习

ICLR 2026 Poster accepted paper at ICLR 2026. Fusing histopathology images and genomics data with deep learning has significantly advanced precision oncology. However, genomics data is often missing due to its high acquisition cost and complexity in real-world clinical scenarios. Existing solutions aim to reconstruct genomics data from histopathology images. Nevertheless, these methods typically relied only on individual case and overlooked the potential relationships among cases. Additionally, they failed to take advantage of the authentic genomics data of diagnostically related cases that are accessible from training for inference. In this work, we propose a novel Multi-modal Structural Representation Learning (MSRL) framework for data-efficient precision oncology. Code/project link: https://github.com/WkEEn/MSRL

医学影像计算医疗多模态 EHR 与临床预测论文 multi-modal learning histopathology image representation learning 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

AttTok：将属性 token 与生成式预训练视觉语言模型结合用于医学图像理解

ICLR 2026 Poster accepted paper at ICLR 2026. Recent generative pre-trained vision–language (GPTv) models have achieved remarkable success in multi-modal understanding, inspiring their adaptation to medical imaging tasks such as disease diagnosis and visual question answering (VQA). However, current instruction-tuned GPTv models suffer from two key challenges: (1) medical attributes (e.g., disease names, severity grades) are encoded as plain text tokens, collapsing semantically distinct concepts into nearly identical textual sequences; and (2) inadequate textual supervision weakens visual representation learning, leading to severe inter-attribute confusion and misaligned vision–language embeddings. To address these limitations, we introduce attribute tokens (AttTok), a set of pre‑defined special tokens that uniquely encode clinical attributes (e.g., imaging modality, diagnosis, severity) within a structured token space. Complemented by attribute‑centric embedding books, AttTok serves as anchor points for aligning both visual and textual modalities into a shared, discriminative representation space.

医学影像计算医疗多模态临床语言智能论文 Medical generative pre-trained models medical Multi-Modal alignment 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

能否用 LLM 为临床时间序列数据生成可迁移表征？

ICLR 2026 Poster accepted paper at ICLR 2026. Deploying clinical ML is slow and brittle: models that work at one hospital often degrade under distribution shifts at the next. In this work, we study a simple question -- can large language models (LLMs) create portable patient embeddings i.e. representations of patients enable a downstream predictor built on one hospital to be used elsewhere with minimal-to-no retraining and fine-tuning. To do so, we map from irregular ICU time series onto concise natural language summaries using a frozen LLM, then embed each summary with a frozen text embedding model to obtain a fixed length vector capable of serving as input to a variety of downstream predictors.

医学影像计算临床语言智能 EHR 与临床预测论文 Machine Learning for Healthcare ICU Time-series 查看论文详情

论文ICLR 2026 Poster2026 年医学影像

面向医学超声的解剖感知表征学习

ICLR 2026 Poster accepted paper at ICLR 2026. Diagnostic accuracy of ultrasound imaging is limited by qualitative variability and its reliance on the expertise of medical professionals. Such challenges increase demand for computer-aided diagnostic systems that enhance diagnostic accuracy and efficiency. However, the unique texture and structural attributes of ultrasound images, and the scarcity of large-scale ultrasound datasets hinder the effective application of conventional machine learning methodologies. To address the challenges, we propose Anatomy-aware Representation Learning (ARL), a novel self-supervised representation learning framework specifically designed for medical ultrasound imaging.

医学影像计算论文 Foundation model medical ultrasound representation learning ICLR 2026 查看论文详情

论文ICLR 2026 Poster2026 年trustworthy medical AI

AbdCTBench：从腹部表面几何学习临床生物标志物表征

ICLR 2026 Poster accepted paper at ICLR 2026. Body composition analysis through CT and MRI imaging provides critical insights for cardio-metabolic health assessment but remains limited by accessibility barriers including radiation exposure, high costs, and infrastructure requirements. We present AbdCTBench, a large-scale dataset containing 23,506 CT-derived abdominal surface meshes from 18,719 patients, paired with 87 comorbidity labels, 31 specific diagnosis codes, and 16 CT-derived biomarkers. Our key insight is that external surface geometry is predictive of internal tissue composition, enabling accessible health screening through consumer devices. We establish comprehensive benchmarks across seven computer vision architectures (ResNet-18/34/50, DenseNet-121, EfficientNet-B0, ViT-Small, Swin Transformer-Base), demonstrating that models can learn robust surface-to-biomarker representations directly from 2D mesh projections. Code/project link: https://abdctbenchrepo.github.io/AbdCTBench/

医学影像计算可信、安全、公平与隐私论文 computer vision for healthcare 放射影像 Computed Tomography (CT)查看论文详情

论文Nature Machine Intelligence2025 年放射影像

胸部 X 光基础模型

ARK is a chest radiography foundation model reported in Nature Machine Intelligence for visual representation learning and radiology downstream tasks.

胸部 X 光影像基础模型放射影像查看论文详情

数据资源MRI, DXA, ultrasound, retinal imaging, genetics, and health recordspopulation-scale multimodal imaging cohortPopulation-scale UK Biobank imaging cohort; application required申请访问

UK Biobank 影像数据

UK Biobank Imaging provides large-scale imaging phenotypes linked to genetic, lifestyle, and health outcome data. It is used for population-scale medical imaging AI, disease risk prediction, representation learning, multimodal biomedical modeling, and epidemiological AI studies.

医学影像计算 EHR 与临床预测医疗多模态数据集 population cohort imaging 查看数据资源

数据资源cardiac ultrasound videos with functional annotationsechocardiography video datasetLarge echocardiography video dataset; see official site申请访问

EchoNet-Dynamic 心脏超声视频数据集

EchoNet-Dynamic is a cardiac ultrasound video dataset with expert annotations for left ventricular function. It is used for echocardiography video understanding, ejection fraction estimation, cardiac segmentation, and clinical video AI research.

医学影像计算数据集 echocardiography ultrasound video cardiology 查看数据资源

数据资源chest radiographs with multi-label findingschest X-ray classification datasetLarge-scale Stanford chest X-ray dataset申请访问

CheXpert 胸部 X 光数据集

CheXpert is a large chest radiograph dataset from Stanford with uncertainty-aware labels for common chest X-ray findings. It is widely used for radiology classification, label uncertainty modeling, chest X-ray representation learning, and clinical imaging benchmarks.

医学影像计算数据集 CXR 放射影像分类 Stanford AIMI 查看数据资源

数据资源EEG and polysomnography biosignalssleep physiology signal datasetExpanded Sleep-EDF PhysioNet dataset; version 1.0.0开放访问

Sleep-EDF Expanded 多导睡眠图数据集

Sleep-EDF Expanded contains polysomnographic sleep recordings with EEG and related physiological signals. It is used for sleep stage classification, biosignal time-series modeling, self-supervised learning on physiological signals, and clinical sleep research benchmarks.

EHR 与临床预测数据集 sleep staging EEG biosignal PhysioNet 查看数据资源

数据资源12-lead ECG waveforms with diagnostic labelsECG waveform benchmarkLarge public ECG dataset; version 1.0.3开放访问

PTB-XL：大型开放 12 导联 ECG 数据集

PTB-XL is a large public 12-lead electrocardiography dataset with diagnostic statements and waveform records. It is a standard benchmark for ECG classification, cardiac abnormality detection, clinical signal representation learning, and robust evaluation of biosignal models.

EHR 与临床预测数据集 ECG 心电 cardiology biosignal 分类查看数据资源

数据资源12-lead ECG waveforms and diagnostic metadataECG waveform datasetLarge-scale diagnostic ECG dataset; version 1.0申请访问

MIMIC-IV-ECG 诊断心电图数据集

MIMIC-IV-ECG is a large deidentified electrocardiogram dataset linked to the MIMIC-IV clinical data ecosystem. It supports ECG classification, arrhythmia detection, representation learning, and multimodal modeling with structured EHR context.

EHR 与临床预测数据集 ECG 心电 biosignal clinical prediction PhysioNet 查看数据资源

数据资源chest radiographs with radiology reportschest X-ray image-report datasetLarge-scale CXR image-report dataset; version 2.1.0申请访问

MIMIC-CXR v2.1.0 胸部 X 光数据集

MIMIC-CXR is a large deidentified chest radiograph dataset with associated free-text radiology reports. It is widely used for chest X-ray classification, report generation, image-text representation learning, radiology retrieval, and medical multimodal foundation model evaluation.

医学影像计算医疗多模态临床语言智能数据集 CXR radiology reports 查看数据资源

数据资源deidentified clinical free textclinical notes datasetClinical note extension for MIMIC-IV; version 2.2申请访问

MIMIC-IV-Note v2.2 临床笔记数据集

MIMIC-IV-Note provides deidentified clinical notes linked to MIMIC-IV hospital data. It supports clinical NLP tasks such as note representation learning, discharge summary modeling, information extraction, summarization, and multimodal EHR-text modeling.

临床语言智能 EHR 与临床预测数据集 clinical NLP notes summarization 查看数据资源

数据资源ECG 心电生理信号21,837 clinical 12-lead ECG records开放访问

PTB-XL ECG 数据库 v1.0.3

Large publicly available 12-lead ECG waveform dataset with diagnostic labels, hosted on PhysioNet.

ECG 心电生理信号 PhysioNet 查看数据资源

数据资源胸部 X 光放射影像PhysioNet v2.1.0受限访问

MIMIC-CXR-JPG v2.1.0

JPG-formatted chest radiographs with labels derived from free-text reports, hosted by PhysioNet.

放射影像胸部 X 光 PhysioNet 查看数据资源

数据资源Multimodal clinical dataBenchmarkICML 2025 benchmark开放访问

CLIMB 临床基础模型基准

Multimodal clinical data foundation and benchmark introduced at ICML 2025 for clinical foundation model research.

benchmark 多模态 clinical foundation model 查看数据资源