AI4Meder 站内搜索

搜索医学 AI 论文与资源

按论文、数据资源、技术竞赛、投稿截止日期和课程资源检索社区内容，快速进入对应详情页。

10 条结果

输入关键词或点击标签，按论文、数据资源、竞赛截止日期、征稿与课程缩小范围。标签：Medical Question Answering

论文ICLR 2026 Poster2026 年clinical prediction

MedAraBench：大规模阿拉伯语医学问答数据集与基准

ICLR 2026 Poster accepted paper at ICLR 2026. Arabic remains one of the most underrepresented languages in natural language processing research, particularly in medical applications, due to the limited availability of open-source data and benchmarks. The lack of resources hinders efforts to evaluate and advance the multilingual capabilities of Large Language Models (LLMs). In this paper, we introduce MedAraBench, a large-scale dataset consisting of Arabic multiple-choice question-answer pairs across various medical specialties. We constructed the dataset by manually digitizing a large repository of academic materials created by medical professionals in the Arabic-speaking region.

医学影像计算临床语言智能 EHR 与临床预测论文 Dataset Benchmark Large Language Models 查看论文详情

论文ICLR 2026 Oral2026 年clinical prediction

CounselBench：心理健康问答中大语言模型的大规模专家评测与对抗基准

ICLR 2026 Oral accepted paper at ICLR 2026. Medical question answering (QA) benchmarks often focus on multiple-choice or fact-based tasks, leaving open-ended answers to real patient questions underexplored. This gap is particularly critical in mental health, where patient questions often mix symptoms, treatment concerns, and emotional needs, requiring answers that balance clinical caution with contextual sensitivity. We present CounselBench, a large-scale benchmark developed with 100 mental health professionals to evaluate and stress-test large language models (LLMs) in realistic help-seeking scenarios. The first component, CounselBench-EVAL, contains 2,000 expert evaluations of answers from GPT-4, LLaMA 3, Gemini, and online human therapists on patient questions from the public forum CounselChat.

医学影像计算临床语言智能 EHR 与临床预测论文 large language models mental health 查看论文详情

论文ICLR 2026 Poster2026 年medical LLM agent

KnowGuard：面向多轮临床推理的知识驱动拒答

ICLR 2026 Poster accepted paper at ICLR 2026. In clinical practice, physicians refrain from making decisions when patient information is insufficient. This behavior, known as abstention, is a critical safety mechanism preventing potentially harmful misdiagnoses. Recent investigations have reported the application of large language models (LLMs) in medical scenarios. However, existing LLMs struggle with the abstentions, frequently providing overconfident responses despite incomplete information. This limitation stems from conventional abstention methods relying solely on model self-assessments, which lack systematic strategies to identify knowledge boundaries with external medical evidences.

医学影像计算临床语言智能论文 multi-agent system 临床推理医学问答查看论文详情

论文arXiv2023 年临床多模态 AI

迈向通用生物医学 AI

Med-PaLM Multimodal evaluates a generalist biomedical AI system across medical question answering, imaging, report generation, and multimodal tasks.

通用医学 AI 多模态临床推理查看论文详情

论文Nature Medicine2025 年临床 LLM

面向专家级医学问答的大语言模型

Nature Medicine paper on Med-PaLM 2 and expert-level medical question answering with large language models.

LLM 医学问答 Med-PaLM 查看论文详情

数据资源Chinese community medical questions and answersChinese medical QA datasetUpdated cMedQA dataset; see official repository开放访问

cMedQA2：中文社区医学问答数据集

cMedQA2 is an updated Chinese community medical question answering dataset for question-answer matching and medical QA research. It is useful for training and evaluating Chinese medical retrieval, ranking, and answer selection models.

临床语言智能数据集 Chinese medical QA answer selection community QA medical_llm_agent 查看数据资源

数据资源Chinese biomedical and clinical textChinese biomedical NLP benchmark8 biomedical NLU tasks; see official repository开放访问

CBLUE：中文生物医学语言理解评测基准

CBLUE is a Chinese biomedical language understanding benchmark covering real-world biomedical NLP tasks such as named entity recognition, relation extraction, term normalization, clinical trial classification, sentence similarity, and medical question answering. It is useful for evaluating Chinese clinical NLP models and medical language models.

临床语言智能数据集 Chinese medical NLP benchmark information extraction QA 查看数据资源

数据资源TextLLM benchmarkBenchmark and leaderboard开放访问

MedHELM 医学 LLM 评测基准

Medical LLM benchmark and leaderboard intended to broaden coverage beyond single medical QA datasets.

benchmark leaderboard medical LLM 查看数据资源

数据资源TextLLM evaluation benchmarkHealth AI evaluation benchmark开放访问

HealthBench 健康 AI 评测基准

Benchmark for evaluating health AI model safety, helpfulness, and clinical-relevance judgments with physician-reviewed rubrics.

benchmark health AI safety LLM evaluation 查看数据资源

数据资源Text and medical imagesModelMedGemma / MedSigLIP model family开放访问

MedGemma / MedSigLIP 医学 AI 模型

Google Health AI Developer Foundations open model resources for medical text and medical image understanding, including MedGemma 1.5 resources.

medical LLM medical VLM open model 查看数据资源