论文ICLR 2026 Poster2026 年clinical NLP

LLM 推理中类人谬误模式的理论扎根评测

ICLR 2026 Poster accepted paper at ICLR 2026. We study logical reasoning in language models by asking whether their errors follow established human fallacy patterns. Using the Erotetic Theory of Reasoning (ETR) and its open‑source implementation, PyETR, we programmatically generate 383 formally specified reasoning problems and evaluate 38 models. For each response, we judge logical correctness and, when incorrect, whether it matches an ETR‑predicted fallacy. Two results stand out: (i) as a capability proxy (Chatbot Arena Elo) increases, a larger share of a model’s incorrect answers are ETR‑predicted fallacies ($\rho=0.360, p=0.0265$), while overall correctness on this dataset shows no correlation with capability; (ii) reversing premise order significantly reduces fallacy production for many models, mirroring human order effects.

临床语言智能论文 LLMs language models reasoning synthetic data contamination-proof human-like errors cognitive fallacies Erotetic Theory of Reasoning PyETR logical fallacies

论文详情

英文标题: Theory-Grounded Evaluation of Human-Like Fallacy Patterns in LLM Reasoning
作者: Andrew Keenan Richardson, Ryan Othniel Kearns, Sean Moss, Vincent Wang, Philipp Koralus
期刊/会议: ICLR 2026 Poster
发表年份: 2026 年
研究方向: clinical NLP