数据资源MRI, DXA, ultrasound, retinal imaging, genetics, and health recordspopulation-scale multimodal imaging cohortPopulation-scale UK Biobank imaging cohort; application required申请访问 UK Biobank 影像数据
UK Biobank Imaging provides large-scale imaging phenotypes linked to genetic, lifestyle, and health outcome data. It is used for population-scale medical imaging AI, disease risk prediction, representation learning, multimodal biomedical modeling, and epidemiological AI studies.
数据资源cardiac ultrasound videos with functional annotationsechocardiography video datasetLarge echocardiography video dataset; see official site申请访问 EchoNet-Dynamic 心脏超声视频数据集
EchoNet-Dynamic is a cardiac ultrasound video dataset with expert annotations for left ventricular function. It is used for echocardiography video understanding, ejection fraction estimation, cardiac segmentation, and clinical video AI research.
数据资源chest radiographs with multi-label findingschest X-ray classification datasetLarge-scale Stanford chest X-ray dataset申请访问 CheXpert 胸部 X 光数据集
CheXpert is a large chest radiograph dataset from Stanford with uncertainty-aware labels for common chest X-ray findings. It is widely used for radiology classification, label uncertainty modeling, chest X-ray representation learning, and clinical imaging benchmarks.
数据资源EEG and polysomnography biosignalssleep physiology signal datasetExpanded Sleep-EDF PhysioNet dataset; version 1.0.0开放访问 Sleep-EDF Expanded 多导睡眠图数据集
Sleep-EDF Expanded contains polysomnographic sleep recordings with EEG and related physiological signals. It is used for sleep stage classification, biosignal time-series modeling, self-supervised learning on physiological signals, and clinical sleep research benchmarks.
数据资源12-lead ECG waveforms with diagnostic labelsECG waveform benchmarkLarge public ECG dataset; version 1.0.3开放访问 PTB-XL:大型开放 12 导联 ECG 数据集
PTB-XL is a large public 12-lead electrocardiography dataset with diagnostic statements and waveform records. It is a standard benchmark for ECG classification, cardiac abnormality detection, clinical signal representation learning, and robust evaluation of biosignal models.
数据资源12-lead ECG waveforms and diagnostic metadataECG waveform datasetLarge-scale diagnostic ECG dataset; version 1.0申请访问 MIMIC-IV-ECG 诊断心电图数据集
MIMIC-IV-ECG is a large deidentified electrocardiogram dataset linked to the MIMIC-IV clinical data ecosystem. It supports ECG classification, arrhythmia detection, representation learning, and multimodal modeling with structured EHR context.
数据资源chest radiographs with radiology reportschest X-ray image-report datasetLarge-scale CXR image-report dataset; version 2.1.0申请访问 MIMIC-CXR v2.1.0 胸部 X 光数据集
MIMIC-CXR is a large deidentified chest radiograph dataset with associated free-text radiology reports. It is widely used for chest X-ray classification, report generation, image-text representation learning, radiology retrieval, and medical multimodal foundation model evaluation.
数据资源deidentified clinical free textclinical notes datasetClinical note extension for MIMIC-IV; version 2.2申请访问 MIMIC-IV-Note v2.2 临床笔记数据集
MIMIC-IV-Note provides deidentified clinical notes linked to MIMIC-IV hospital data. It supports clinical NLP tasks such as note representation learning, discharge summary modeling, information extraction, summarization, and multimodal EHR-text modeling.
数据资源ECG 心电生理信号21,837 clinical 12-lead ECG records开放访问 PTB-XL ECG 数据库 v1.0.3
Large publicly available 12-lead ECG waveform dataset with diagnostic labels, hosted on PhysioNet.
数据资源胸部 X 光放射影像PhysioNet v2.1.0受限访问 MIMIC-CXR-JPG v2.1.0
JPG-formatted chest radiographs with labels derived from free-text reports, hosted by PhysioNet.
数据资源Multimodal clinical dataBenchmarkICML 2025 benchmark开放访问 CLIMB 临床基础模型基准
Multimodal clinical data foundation and benchmark introduced at ICML 2025 for clinical foundation model research.