数据资源critical care time-series variables and outcomesICU time-series benchmark datasetPhysioNet Challenge 2012 dataset; version 1.0.0开放访问 PhysioNet/CinC 2012 ICU 时间序列数据集
The PhysioNet/CinC Challenge 2012 dataset contains ICU time-series records used for mortality prediction and patient-specific outcome modeling. It remains a useful benchmark for clinical time-series modeling, missingness-aware learning, and early warning model development.
数据资源retinal fundus photographs with glaucoma and structure annotationsophthalmology fundus image challenge datasetREFUGE challenge dataset; official splits described on Grand Challenge申请访问 REFUGE 视网膜眼底青光眼挑战数据集
REFUGE is a retinal fundus imaging challenge dataset for glaucoma assessment. It supports glaucoma classification, optic disc and cup segmentation, fovea localization, and fair comparison of ophthalmology AI methods on color fundus photographs.
数据资源upper extremity radiographs with abnormality labelsmusculoskeletal X-ray datasetLarge Stanford musculoskeletal radiograph dataset申请访问 MURA 肌骨 X 光数据集
MURA is a musculoskeletal radiograph dataset from Stanford for abnormality detection in upper extremity X-rays. It is used for radiology classification, fracture-related screening, musculoskeletal imaging AI, and human-AI comparison studies.
数据资源cine cardiac MRI with segmentation labelscardiac MRI segmentation datasetACDC challenge dataset; see official database page申请访问 ACDC 自动心脏诊断挑战数据集
ACDC is a cardiac MRI dataset for automated cardiac diagnosis and segmentation. It supports left and right ventricular segmentation, myocardium segmentation, cardiac function quantification, and evaluation of robust cardiac image analysis methods.
数据资源genomics, transcriptomics, clinical metadata, and pathology-related datacancer genomics and clinical datasetLarge multi-cancer TCGA program dataset开放访问 TCGA 癌症基因组数据集
The Cancer Genome Atlas is a large cancer genomics resource with molecular, clinical, and pathology-related data across many cancer types. It is a foundation dataset for oncology AI, survival prediction, subtype discovery, multimodal cancer modeling, and translational biomarker research.
数据资源cardiac ultrasound videos with functional annotationsechocardiography video datasetLarge echocardiography video dataset; see official site申请访问 EchoNet-Dynamic 心脏超声视频数据集
EchoNet-Dynamic is a cardiac ultrasound video dataset with expert annotations for left ventricular function. It is used for echocardiography video understanding, ejection fraction estimation, cardiac segmentation, and clinical video AI research.
数据资源histopathology whole-slide imagesdigital pathology whole-slide image datasetCAMELYON17 challenge dataset; see Grand Challenge page申请访问 CAMELYON17 组织病理淋巴结转移数据集
CAMELYON17 is a digital pathology dataset for detecting breast cancer metastases in lymph node whole-slide images across multiple centers. It supports pathology classification, metastasis detection, weakly supervised learning, and domain generalization in histopathology AI.
数据资源dermoscopic and clinical skin lesion imagesdermatology image archiveLarge public ISIC dermatology image archive开放访问 ISIC Archive 皮肤病学图像数据集
The ISIC Archive is a large public dermatology image repository for skin lesion analysis. It is widely used for melanoma classification, lesion segmentation, dermoscopic image retrieval, bias and domain shift analysis, and clinical imaging benchmark development.
数据资源EEG and polysomnography biosignalssleep physiology signal datasetExpanded Sleep-EDF PhysioNet dataset; version 1.0.0开放访问 Sleep-EDF Expanded 多导睡眠图数据集
Sleep-EDF Expanded contains polysomnographic sleep recordings with EEG and related physiological signals. It is used for sleep stage classification, biosignal time-series modeling, self-supervised learning on physiological signals, and clinical sleep research benchmarks.
数据资源12-lead ECG waveforms with diagnostic labelsECG waveform benchmarkLarge public ECG dataset; version 1.0.3开放访问 PTB-XL:大型开放 12 导联 ECG 数据集
PTB-XL is a large public 12-lead electrocardiography dataset with diagnostic statements and waveform records. It is a standard benchmark for ECG classification, cardiac abnormality detection, clinical signal representation learning, and robust evaluation of biosignal models.
数据资源structured critical care EHR tablesmulticenter ICU EHR datasetMulticenter ICU database; version 2.0申请访问 eICU 协作研究数据库
The eICU Collaborative Research Database is a multicenter critical care database containing deidentified ICU data from many hospitals. It is commonly used for external validation, ICU outcome prediction, temporal modeling, and cross-site generalization studies in clinical AI.
数据资源12-lead ECG waveforms and diagnostic metadataECG waveform datasetLarge-scale diagnostic ECG dataset; version 1.0申请访问 MIMIC-IV-ECG 诊断心电图数据集
MIMIC-IV-ECG is a large deidentified electrocardiogram dataset linked to the MIMIC-IV clinical data ecosystem. It supports ECG classification, arrhythmia detection, representation learning, and multimodal modeling with structured EHR context.
数据资源deidentified structured EHR tablescritical care EHR datasetLarge-scale hospital and ICU EHR dataset; version 3.1申请访问 MIMIC-IV v3.1 重症监护与住院 EHR 数据集
MIMIC-IV is a large deidentified electronic health record dataset from Beth Israel Deaconess Medical Center, covering hospital and ICU data for critical care research. It is a core benchmark source for clinical prediction, temporal EHR modeling, phenotyping, and healthcare AI method development.