论文ICLR 2026 Poster2026 年clinical NLP 用于胸部 X 光图像的结构化、标注式、定位化 VQA 数据集:含完整句答案与场景图
ICLR 2026 Poster accepted paper at ICLR 2026. Visual Question Answering (VQA) enables targeted and context-dependent analysis of medical images, such as chest X-rays (CXRs). However, existing VQA datasets for CXRs are typically constrained by simplistic and brief answer formats, lacking localization annotations (e.g., bounding boxes) and structured tags (e.g., region or radiological finding/disease tags). To address these limitations, we introduce MIMIC-Ext-CXR-QBA (abbr. CXR-QBA), a large-scale CXR VQA dataset derived from MIMIC-CXR, comprising 42 million QA-pairs with multi-granular, multi-part answers, detailed bounding boxes, and structured tags. Code/project link: https://github.com/philip-mueller/mimic-ext-cxr-qba/
数据资源deidentified clinical free textclinical notes datasetClinical note extension for MIMIC-IV; version 2.2申请访问 MIMIC-IV-Note v2.2 临床笔记数据集
MIMIC-IV-Note provides deidentified clinical notes linked to MIMIC-IV hospital data. It supports clinical NLP tasks such as note representation learning, discharge summary modeling, information extraction, summarization, and multimodal EHR-text modeling.
数据资源Chinese biomedical and clinical textChinese biomedical NLP benchmark8 biomedical NLU tasks; see official repository开放访问 CBLUE:中文生物医学语言理解评测基准
CBLUE is a Chinese biomedical language understanding benchmark covering real-world biomedical NLP tasks such as named entity recognition, relation extraction, term normalization, clinical trial classification, sentence similarity, and medical question answering. It is useful for evaluating Chinese clinical NLP models and medical language models.