论文ICLR 2026 Poster2026 年trustworthy medical AI

大语言模型的医学可解释性与知识图谱

ICLR 2026 Poster accepted paper at ICLR 2026. We present a systematic study of medical-domain interpretability in Large Language Models (LLMs). We study how the LLMs both represent and process medical knowledge through four different interpretability techniques: (1) UMAP projections of intermediate activations, (2) gradient-based saliency with respect to the model weights, (3) layer lesioning/removal and (4) activation patching. We present knowledge maps of five LLMs which show, at a coarse-resolution, where knowledge about patient's ages, medical symptoms, diseases and drugs is stored in the models. In particular for Llama3.3-70B, we find that most medical knowledge is processed in the first half of the model's layers.

医学影像计算临床语言智能 EHR 与临床预测论文 Large Language Models Interpretability Explainability Medicine Healthcare Knowledge Maps ICLR 2026 ICLR 2026 Poster

论文详情

英文标题: Medical Interpretability and Knowledge Maps of Large Language Models
作者: Razvan Marinescu, Victoria-Elisabeth Gruber, Diego Fajardo V.
期刊/会议: ICLR 2026 Poster
发表年份: 2026 年
研究方向: trustworthy medical AI