站内搜索 - AI4Meder

论文ICLR 2026 Oral2026 年clinical prediction

CounselBench：心理健康问答中大语言模型的大规模专家评测与对抗基准

ICLR 2026 Oral accepted paper at ICLR 2026. Medical question answering (QA) benchmarks often focus on multiple-choice or fact-based tasks, leaving open-ended answers to real patient questions underexplored. This gap is particularly critical in mental health, where patient questions often mix symptoms, treatment concerns, and emotional needs, requiring answers that balance clinical caution with contextual sensitivity. We present CounselBench, a large-scale benchmark developed with 100 mental health professionals to evaluate and stress-test large language models (LLMs) in realistic help-seeking scenarios. The first component, CounselBench-EVAL, contains 2,000 expert evaluations of answers from GPT-4, LLaMA 3, Gemini, and online human therapists on patient questions from the public forum CounselChat.

医学影像计算临床语言智能 EHR 与临床预测论文 large language models mental health 查看论文详情

搜索医学 AI 论文与资源

1 条结果

CounselBench：心理健康问答中大语言模型的大规模专家评测与对抗基准