论文ICLR 2026 Poster2026 年trustworthy medical AI

ATPO：面向多轮医学对话的自适应树策略优化

ICLR 2026 Poster accepted paper at ICLR 2026. Effective information seeking in multi-turn medical dialogues is critical for accurate diagnosis, especially when dealing with incomplete information. Aligning Large Language Models (LLMs) for these interactive scenarios is challenging due to the uncertainty inherent in user-agent interactions, which we formulate as a Hierarchical Markov Decision Process (H-MDP). While conventional Reinforcement Learning (RL) methods like Group Relative Policy Optimization (GRPO) struggle with long-horizon credit assignment and Proximal Policy Optimization (PPO) suffers from unstable value estimation in this context, we propose a novel uncertainty-aware Adaptive Tree Policy Optimization (ATPO) algorithm. Our method adaptively allocates the rollout budget to states with high uncertainty, quantified by a composite metric of Bellman error and action-value variance.

医学影像计算临床语言智能可信、安全、公平与隐私论文 Reinforcement Learning (RL)Large Language Models (LLMs)Medical Dialogue Tree Search ICLR 2026 ICLR 2026 Poster medical_llm_agent

论文详情

英文标题: ATPO: ADAPTIVE TREE POLICY OPTIMIZATION FOR MULTI-TURN MEDICAL DIALOGUE
作者: Ruike Cao, Shaojie Bai, Fugen Yao, Liang Dong, Jian Xu, Li Xiao
期刊/会议: ICLR 2026 Poster
发表年份: 2026 年
研究方向: trustworthy medical AI