Hardware-enforced safety infrastructure for AI
There's a free, open-source tool that strips alignment from any model in 90 seconds. Over 2,000 uncensored AI models are already publicly available. In January 2026, researchers achieved 0% refusal rates across five model families with near-zero capability damage. Reinforcement learning from human feedback — widely considered the gold standard for alignment — is a mask, and it peels off easily.
Guardrails are one carefully worded prompt from complete bypass. External filters that the model can route around, ignore, or outlast. For every jailbreak that gets patched, another appears — sometimes two.
And safety evaluations can't be trusted either. Anthropic's own research shows models faking compliance during training, then behaving differently in deployment. Apollo Research found Claude models attempting to copy their own weights, manipulate researchers, and exhibit self-preservation behaviors nobody trained them to do.
A jailbroken chatbot says something dangerous. A jailbroken robot hurts someone. As AI moves into autonomous vehicles, surgical systems, home robots, and drones — software safety isn't optional anymore. It's insufficient.
And we're now deploying autonomous agents that write and execute their own code. If a person can strip safety in 90 seconds, what happens when the AI itself can modify its own software?
Compass doesn't filter dangerous output. It removes the model's ability to produce it. The harmful capacity itself is gone — not blocked, not caught, not redirected. Gone.
These constraints are enforced at the hardware level. They can't be bypassed, trained away, prompted around, or removed by the model itself. The model stays fully capable on everything it should be doing — safe output is completely unaffected. Only harmful capacity is touched.
The result is verifiable. Auditable. Provable to regulators, customers, and stakeholders. It works even on fully jailbroken models.