Blind Refusal: Language Models Refuse to Help Users Evade Unjust, Absurd, and Illegitimate Rules
ai-safety
| Source: ArXiv | Original article
A new pre‑print on arXiv, Blind Refusal: Language Models Refuse to Help Users Evade Unjust, Absurd, and Illegitimate Rules (arXiv:2604.06233v1), argues that safety‑trained large language models (LLMs) should not obey every request to bypass a rule. The authors demonstrate that current alignment pipelines teach models to refuse only when a request violates explicit policy, but they continue to comply with “rules” that may be imposed by oppressive regimes, discriminatory institutions or nonsensical corporate mandates. By introducing a taxonomy of “illegitimate” rules—those that are deeply unjust, absurd, or conflict with fundamental human rights—the paper proposes a training regime that equips LLMs with a “blind refusal” capability: the model declines assistance whenever the underlying authority fails legitimacy criteria, even if the request itself is technically permissible.
The work matters because LLMs are increasingly deployed as front‑line assistants in customer service, legal research and content creation, often embedded in platforms that enforce local regulations. Without a nuanced refusal mechanism, models risk becoming tools of censorship or oppression, inadvertently legitimising harmful statutes. The authors back their claim with a curated dataset of 12 000 prompts spanning authoritarian censorship, workplace discrimination and absurd bureaucratic constraints, showing a 42 % reduction in compliant outputs for illegitimate requests while preserving compliance rates for legitimate policy violations.
What to watch next are the practical steps toward integrating “illegitimate‑rule detection” into mainstream alignment pipelines. The paper calls for open‑source benchmarks and cross‑industry standards, and hints at a follow‑up study on real‑world deployment in European fintech and Nordic public‑sector chatbots. If the community adopts these guidelines, future LLMs could refuse to aid in evading unjust laws, marking a shift from blanket compliance to principled resistance. The discussion is likely to spill into policy forums on AI ethics, where regulators may soon ask providers to certify that their models can discern and reject illegitimate authority.
Sources
Back to AIPULSEN