数据资源Chinese medical instruction and dialogue textChinese medical instruction-tuning datasetAbout 140K medical SFT examples; see Hugging Face card开放访问 HuatuoGPT2-SFT-GPT4-140K 医学指令数据集
HuatuoGPT2-SFT-GPT4-140K is a Chinese medical supervised fine-tuning dataset containing medical instruction-style conversations and GPT-4-assisted responses. It is useful for Chinese medical assistant alignment and medical LLM instruction tuning.
数据资源Chinese medical question-answer textChinese medical QA corpusAbout 26 million medical QA pairs开放访问 Huatuo-26M:大规模中文医学问答数据集
Huatuo-26M is a large-scale Chinese medical question-answering dataset with about 26 million QA pairs collected for medical language modeling and medical dialogue research. It is suitable for Chinese medical LLM pretraining, fine-tuning, and QA system development.
数据资源Chinese medical exam and QA textChinese medical LLM evaluation benchmarkMultiple Chinese medical exam and benchmark splits; see Hugging Face card开放访问 CMB:中文医学基准
CMB is a comprehensive Chinese medical benchmark for evaluating medical large language models on medical exams, reasoning, and clinical knowledge questions. It is suited for Chinese medical QA, LLM evaluation, and instruction-following assessment.