论文ICLR 2026 Poster2026 年trustworthy medical AI ATPO:面向多轮医学对话的自适应树策略优化
ICLR 2026 Poster accepted paper at ICLR 2026. Effective information seeking in multi-turn medical dialogues is critical for accurate diagnosis, especially when dealing with incomplete information. Aligning Large Language Models (LLMs) for these interactive scenarios is challenging due to the uncertainty inherent in user-agent interactions, which we formulate as a Hierarchical Markov Decision Process (H-MDP). While conventional Reinforcement Learning (RL) methods like Group Relative Policy Optimization (GRPO) struggle with long-horizon credit assignment and Proximal Policy Optimization (PPO) suffers from unstable value estimation in this context, we propose a novel uncertainty-aware Adaptive Tree Policy Optimization (ATPO) algorithm. Our method adaptively allocates the rollout budget to states with high uncertainty, quantified by a composite metric of Bellman error and action-value variance.
数据资源Chinese conversational medical QA textChinese medical conversational QA datasetLarge-scale Chinese medical CQA dataset; see official repository开放访问 CMCQA:中文医学会话问答数据集
CMCQA is a large Chinese medical conversational question-answering dataset released with knowledge-grounded medical dialogue research. It supports medical conversation QA, knowledge-grounded response generation, and evaluation of Chinese medical dialogue systems.
数据资源Chinese medical question-answer textChinese medical QA corpusAbout 26 million medical QA pairs开放访问 Huatuo-26M:大规模中文医学问答数据集
Huatuo-26M is a large-scale Chinese medical question-answering dataset with about 26 million QA pairs collected for medical language modeling and medical dialogue research. It is suitable for Chinese medical LLM pretraining, fine-tuning, and QA system development.
数据资源Chinese consultation dialogue text with medical entity annotationsChinese medical dialogue generation datasetEntity-annotated dialogue dataset; see official repository开放访问 MedDG:实体中心中文医学对话生成数据集
MedDG is an entity-centric Chinese medical consultation dataset with domain entity annotations for medical dialogue generation. It supports entity-aware response generation, medical consultation modeling, and dialogue systems that ground responses in clinical concepts.