论文ICLR 2026 Oral2026 年clinical prediction

CounselBench：心理健康问答中大语言模型的大规模专家评测与对抗基准

ICLR 2026 Oral accepted paper at ICLR 2026. Medical question answering (QA) benchmarks often focus on multiple-choice or fact-based tasks, leaving open-ended answers to real patient questions underexplored. This gap is particularly critical in mental health, where patient questions often mix symptoms, treatment concerns, and emotional needs, requiring answers that balance clinical caution with contextual sensitivity. We present CounselBench, a large-scale benchmark developed with 100 mental health professionals to evaluate and stress-test large language models (LLMs) in realistic help-seeking scenarios. The first component, CounselBench-EVAL, contains 2,000 expert evaluations of answers from GPT-4, LLaMA 3, Gemini, and online human therapists on patient questions from the public forum CounselChat.

医学影像计算临床语言智能 EHR 与临床预测论文 large language models mental health human evaluation ICLR 2026 ICLR 2026 Oral clinical_translation

论文详情

英文标题: CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmarking of Large Language Models in Mental Health Question Answering
作者: Yahan Li, Jifan Yao, John Bosco S. Bunyi, Adam C Frank, Angel Hsing-Chi Hwang, Ruishan Liu
期刊/会议: ICLR 2026 Oral
发表年份: 2026 年
研究方向: clinical prediction