AI4Meder

AI4Meder 站内搜索

搜索医学 AI 论文与资源

按论文、数据资源、技术竞赛、投稿截止日期和课程资源检索社区内容,快速进入对应详情页。

9 条结果

输入关键词或点击标签,按论文、数据资源、竞赛截止日期、征稿与课程缩小范围。 标签:multimodal learning

清空筛选
论文ICLR 2026 Poster2026 年trustworthy medical AI

融合像素与基因:计算病理中的空间感知学习

ICLR 2026 Poster accepted paper at ICLR 2026. Recent years have witnessed remarkable progress in multimodal learning within computational pathology. Existing models primarily rely on vision and language modalities; however, language alone lacks molecular specificity and offers limited pathological supervision, leading to representational bottlenecks. In this paper, we propose STAMP, a Spatial Transcriptomics-Augmented Multimodal Pathology representation learning framework that integrates spatially-resolved gene expression profiles to enable molecule-guided joint embedding of pathology images and transcriptomic data. Our study shows that self-supervised, gene-guided training provides a robust and task-agnostic signal for learning pathology image representations. Code/project link: https://github.com/Hanminghao/STAMP

论文ICLR 2026 Poster2026 年trustworthy medical AI

Johnson-Lindenstrauss 引理引导的高效 3D 医学分割网络

ICLR 2026 Poster accepted paper at ICLR 2026. Lightweight 3D medical image segmentation remains constrained by a fundamental "efficiency / robustness conflict", particularly when processing complex anatomical structures and heterogeneous modalities. In this paper, we study how to redesign the framework based on the characteristics of high-dimensional 3D images, and explore data synergy to overcome the fragile representation of lightweight methods. Our approach, VeloxSeg, begins with a deployable and extensible dual-stream CNN-Transformer architecture composed of Paired Window Attention (PWA) and Johnson-Lindenstrauss lemma-guided convolution (JLC). For each 3D image, we invoke a "glance-and-focus" principle, where PWA rapidly retrieves multi-scale information, and JLC ensures robust local feature extraction with minimal parameters, significantly enhancing the model's ability to operate with low computational budget. Code/project link: https://github.com/JinPLu/VeloxSeg

论文ICLR 2026 Poster2026 年clinical NLP

迈向医学图像分割中的文本-掩膜一致性

ICLR 2026 Poster accepted paper at ICLR 2026. Vision-language models for medical image segmentation often produce masks that conflict with the accompanying text, especially under multi-site/multi-lesion descriptions. We trace this failure to two factors: (i) highly templated and repetitive clinical language causes one-to-one hard contrastive learning to yield numerous false negatives, weakening cross-modal alignment; and (ii) predominantly vision-driven, one-way cross-attention lacks a language-dominant, spatially aware pathway, hindering effective injection of textual semantics into the spatial visual domain. To this end, we propose Consistency-enhanced Two-stage Segmentation (C2Seg). In the pretraining stage, Cluster-aware Contrastive Learning uses a frozen strong baseline to construct an intra-batch text similarity matrix as soft labels, thereby alleviating false negative conflicts and producing more discriminative visual representations.

论文ICLR 2026 Poster2026 年clinical NLP

通过多粒度语言学习增强医学视觉理解

ICLR 2026 Poster accepted paper at ICLR 2026. Recent advances in image-text pretraining have significantly enhanced visual understanding by aligning visual and textual representations. Contrastive Language-Image Pretraining (CLIP) has played a pivotal role in multimodal learning. However, its focus on single-label, single-granularity alignment limits its effectiveness in complex domains such as medical imaging, where images often correspond to multiple labels across different levels of granularity. To address this, we propose Multi-Granular Language Learning (MGLL), a contrastive learning framework designed to improve both multi-label and cross-granularity alignment. Code/project link: https://github.com/HUANGLIZI/MGLL

论文ICLR 2026 Poster2026 年trustworthy medical AI

面向多模态癌症生存分析的结构化预后事件建模

ICLR 2026 Poster accepted paper at ICLR 2026. The integration of histology images and gene profiles has shown great promise for improving survival prediction in cancer. However, current approaches often struggle to model intra- and inter-modal interactions efficiently and effectively due to the high dimensionality and complexity of the inputs. A major challenge is capturing critical prognostic events that, though few, underlie the complexity of the observed inputs and largely determine patient outcomes. These events---manifested as high-level structural signals such as spatial histologic patterns or pathway co-activations---are typically sparse, patient-specific, and unannotated, making them inherently difficult to uncover.

论文ICLR 2026 Poster2026 年trustworthy medical AI

Resp-Agent:面向多模态呼吸音生成与疾病诊断的 Agent 系统

ICLR 2026 Poster accepted paper at ICLR 2026. Deep learning-based respiratory auscultation is currently hindered by two fundamental challenges: (i) inherent information loss, as converting signals into spectrograms discards transient acoustic events and clinical context; (ii) limited data availability, exacerbated by severe class imbalance. To bridge these gaps, we present **_Resp-Agent_**, an autonomous multimodal system orchestrated by a novel Active Adversarial Curriculum Agent (Thinker-A²CA). Unlike static pipelines, Thinker-A²CA serves as a central controller that actively identifies diagnostic weaknesses and schedules targeted synthesis in a closed loop. To address the representation gap, we introduce a modality-weaving Diagnoser that weaves clinical text with audio tokens via strategic global attention and sparse audio anchors, capturing both long-range clinical context and millisecond-level transients. Code/project link: https://github.com/zpforlove/Resp-Agent

论文ICLR 2026 Poster2026 年clinical prediction

通过概念型多模态协同适配桥接放射学与病理学基础模型

ICLR 2026 Poster accepted paper at ICLR 2026. Pretrained medical foundation models (FMs) have shown strong generalization across diverse imaging tasks, such as disease classification in radiology and tumor grading in histopathology. While recent advances in parameter-efficient finetuning have enabled effective adaptation of FMs to downstream tasks, these approaches are typically designed for a single modality. In contrast, many clinical workflows rely on joint diagnosis from heterogeneous domains, such as radiology and pathology, where fully leveraging the representation capacity of multiple FMs remains an open challenge. To address this gap, we propose Concept Tuning and Fusing (CTF), a parameter-efficient framework that uses clinically grounded concepts as a shared semantic interface to enable cross-modal co-adaptation before fusion. Code/project link: https://github.com/HKU-MedAI/CTF; https://github.com/neuronflow/BraTS-Toolkit

数据资源medical images with bilingual visual questions and answersmedical visual question answering datasetBilingual medical VQA dataset; see official project page开放访问

SLAKE:语义标注、知识增强医学 VQA 数据集

SLAKE is a semantically labeled medical visual question answering dataset with bilingual English-Chinese questions, medical images, and knowledge-enhanced annotations. It is useful for medical multimodal learning, image-grounded QA, and radiology VQA evaluation.

征稿与合作Scientific Reports截止 北京时间 2026-06-23期刊专刊

Scientific Reports 专辑:临床决策 AI

This Nature Portfolio / Scientific Reports collection is open for submissions until 2026-06-23. It focuses on AI for clinical decision-making, including diagnostic, prognostic, and therapeutic decision support, EHRs, medical imaging, genomics, real-time patient data, clinical notes, multimodal learning, privacy-preserving AI, interpretability, and validation.