AI4Meder
返回论文列表
论文ICLR 2026 Oral2026 年clinical prediction

CounselBench:心理健康问答中大语言模型的大规模专家评测与对抗基准

ICLR 2026 Oral accepted paper at ICLR 2026. Medical question answering (QA) benchmarks often focus on multiple-choice or fact-based tasks, leaving open-ended answers to real patient questions underexplored. This gap is particularly critical in mental health, where patient questions often mix symptoms, treatment concerns, and emotional needs, requiring answers that balance clinical caution with contextual sensitivity. We present CounselBench, a large-scale benchmark developed with 100 mental health professionals to evaluate and stress-test large language models (LLMs) in realistic help-seeking scenarios. The first component, CounselBench-EVAL, contains 2,000 expert evaluations of answers from GPT-4, LLaMA 3, Gemini, and online human therapists on patient questions from the public forum CounselChat.

论文默认配图 - 医学影像计算

论文详情

英文标题
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmarking of Large Language Models in Mental Health Question Answering
作者
Yahan Li, Jifan Yao, John Bosco S. Bunyi, Adam C Frank, Angel Hsing-Chi Hwang, Ruishan Liu
期刊/会议
ICLR 2026 Oral
发表年份
2026 年
研究方向
clinical prediction