论文ICLR 2026 Poster2026 年clinical NLP

用于胸部 X 光图像的结构化、标注式、定位化 VQA 数据集：含完整句答案与场景图

ICLR 2026 Poster accepted paper at ICLR 2026. Visual Question Answering (VQA) enables targeted and context-dependent analysis of medical images, such as chest X-rays (CXRs). However, existing VQA datasets for CXRs are typically constrained by simplistic and brief answer formats, lacking localization annotations (e.g., bounding boxes) and structured tags (e.g., region or radiological finding/disease tags). To address these limitations, we introduce MIMIC-Ext-CXR-QBA (abbr. CXR-QBA), a large-scale CXR VQA dataset derived from MIMIC-CXR, comprising 42 million QA-pairs with multi-granular, multi-part answers, detailed bounding boxes, and structured tags. Code/project link: https://github.com/philip-mueller/mimic-ext-cxr-qba/

医学影像计算医疗多模态临床语言智能论文 VQA Localization Vision-Language Modeling Medical Imaging Chest X-Rays Scene Graphs ICLR 2026 ICLR 2026 Poster

论文详情

英文标题: A Structured, Tagged, and Localized Visual Question Answering Dataset with Full Sentence Answers and Scene Graphs for Chest X-ray Images
作者: Philip Müller, Friederike Jungmann, Georgios Kaissis, Daniel Rueckert
期刊/会议: ICLR 2026 Poster
发表年份: 2026 年
研究方向: clinical NLP