论文ICLR 2026 Poster2026 年trustworthy medical AI

Cancer-Myth：评估大语言模型回答含错误预设的患者问题

ICLR 2026 Poster accepted paper at ICLR 2026. Cancer patients are increasingly turning to large language models (LLMs) for medical information, making it critical to assess how well these models handle complex, personalized questions. However, current medical benchmarks focus on medical exams or consumer-searched questions and do not evaluate LLMs on real patient questions with patient details. In this paper, we first have three hematology-oncology physicians evaluate cancer-related questions drawn from real patients. While LLM responses are generally accurate, the models frequently fail to recognize or address false presuppositions} in the questions, posing risks to safe medical decision-making.

医学影像计算临床语言智能可信、安全、公平与隐私论文 Medical benchmark LLM evaluation LLM sycophancy Medical agent Adversarial generation ICLR 2026 ICLR 2026 Poster medical_llm_agent

论文详情

英文标题: Cancer-Myth: Evaluating Large Language Models on Patient Questions with False Presuppositions
作者: Wang Bill Zhu, Tian-qi Chen, Xinyan Velocity Yu, Ching Ying Lin, Jade Law, Mazen Jizzini, Jorge J. Nieva, Ruishan Liu, Robin Jia
期刊/会议: ICLR 2026 Poster
发表年份: 2026 年
研究方向: trustworthy medical AI