论文ICLR 2026 Poster2026 年trustworthy medical AI

弥合安全缺口：视觉自回归模型中的手术概念擦除

ICLR 2026 Poster accepted paper at ICLR 2026. The rapid progress of visual autoregressive (VAR) models has brought new opportunities for text-to-image generation, but also heightened safety concerns. Existing concept erasure techniques, primarily designed for diffusion models, fail to generalize to VARs due to their next-scale token prediction paradigm. In this paper, we first propose a novel VAR Erasure framework **VARE** that enables stable concept erasure in VAR models by leveraging auxiliary visual tokens to reduce fine-tuning intensity. Building upon this, we introduce **S-VARE**, a novel and effective concept erasure method designed for VAR, which incorporates a filtered cross entropy loss to precisely identify and minimally adjust unsafe visual tokens, along with a preservation loss to maintain semantic fidelity, addressing the issues such as language drift and reduced diversity introduce by na\"ive fine-tuning.

可信、安全、公平与隐私论文 visual autoregressive model concept erasure ICLR 2026 ICLR 2026 Poster remaining batch surgical_intervention

论文详情

英文标题: Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models
作者: Xinhao Zhong, Yimin Zhou, Zhiqi Zhang, Junhao Li, Sun Yi, Bin Chen, Shu-Tao Xia, Xuan Wang, Ke Xu
期刊/会议: ICLR 2026 Poster
发表年份: 2026 年
研究方向: trustworthy medical AI