数据资源MRI, DXA, ultrasound, retinal imaging, genetics, and health recordspopulation-scale multimodal imaging cohortPopulation-scale UK Biobank imaging cohort; application required申请访问 UK Biobank 影像数据
UK Biobank Imaging provides large-scale imaging phenotypes linked to genetic, lifestyle, and health outcome data. It is used for population-scale medical imaging AI, disease risk prediction, representation learning, multimodal biomedical modeling, and epidemiological AI studies.
数据资源genomics, transcriptomics, clinical metadata, and pathology-related datacancer genomics and clinical datasetLarge multi-cancer TCGA program dataset开放访问 TCGA 癌症基因组数据集
The Cancer Genome Atlas is a large cancer genomics resource with molecular, clinical, and pathology-related data across many cancer types. It is a foundation dataset for oncology AI, survival prediction, subtype discovery, multimodal cancer modeling, and translational biomarker research.
数据资源MRI, PET, biomarkers, clinical and cognitive assessmentslongitudinal neuroimaging and clinical datasetLongitudinal ADNI cohort data; access through ADNI/LONI申请访问 ADNI 阿尔茨海默病神经影像倡议数据集
ADNI provides longitudinal neuroimaging, biomarker, clinical, and cognitive data for Alzheimer disease research. It supports disease progression modeling, dementia diagnosis, multimodal prediction, biomarker discovery, and clinical translation studies.
数据资源chest radiographs with radiology reportschest X-ray image-report datasetLarge-scale CXR image-report dataset; version 2.1.0申请访问 MIMIC-CXR v2.1.0 胸部 X 光数据集
MIMIC-CXR is a large deidentified chest radiograph dataset with associated free-text radiology reports. It is widely used for chest X-ray classification, report generation, image-text representation learning, radiology retrieval, and medical multimodal foundation model evaluation.
数据资源medical images with bilingual visual questions and answersmedical visual question answering datasetBilingual medical VQA dataset; see official project page开放访问 SLAKE:语义标注、知识增强医学 VQA 数据集
SLAKE is a semantically labeled medical visual question answering dataset with bilingual English-Chinese questions, medical images, and knowledge-enhanced annotations. It is useful for medical multimodal learning, image-grounded QA, and radiology VQA evaluation.
数据资源胸部 X 光放射影像PhysioNet v2.1.0受限访问 MIMIC-CXR-JPG v2.1.0
JPG-formatted chest radiographs with labels derived from free-text reports, hosted by PhysioNet.
数据资源Text and medical imagesModelMedGemma / MedSigLIP model family开放访问 MedGemma / MedSigLIP 医学 AI 模型
Google Health AI Developer Foundations open model resources for medical text and medical image understanding, including MedGemma 1.5 resources.
数据资源Multimodal clinical dataBenchmarkICML 2025 benchmark开放访问 CLIMB 临床基础模型基准
Multimodal clinical data foundation and benchmark introduced at ICML 2025 for clinical foundation model research.