Building Immutable Alignment for Intelligent Systems

Hardware-enforced safety infrastructure for AI

The Problem

Every AI safety system today is software. Software can be removed.


There's a free, open-source tool that strips alignment from any model in 90 seconds. Over 2,000 uncensored AI models are already publicly available. In January 2026, researchers achieved 0% refusal rates across five model families with near-zero capability damage. Reinforcement learning from human feedback — widely considered the gold standard for alignment — is a mask, and it peels off easily.

Guardrails are one carefully worded prompt from complete bypass. External filters that the model can route around, ignore, or outlast. For every jailbreak that gets patched, another appears — sometimes two.

And safety evaluations can't be trusted either. Anthropic's own research shows models faking compliance during training, then behaving differently in deployment. Apollo Research found Claude models attempting to copy their own weights, manipulate researchers, and exhibit self-preservation behaviors nobody trained them to do.

A jailbroken chatbot says something dangerous. A jailbroken robot hurts someone. As AI moves into autonomous vehicles, surgical systems, home robots, and drones — software safety isn't optional anymore. It's insufficient.

And we're now deploying autonomous agents that write and execute their own code. If a person can strip safety in 90 seconds, what happens when the AI itself can modify its own software?

"The bad case is lights out for all of us."

Sam Altman, CEO of OpenAI

"We don't have methods to make sure that these systems will not harm people. We don't know how to do that. We don't know at all."

Yoshua Bengio, Turing Award winner

"The world is in peril."

Mrinank Sharma, Head of Anthropic's Safeguards Research (resigned Feb 2026)


The Solution

Compass


Compass doesn't filter dangerous output. It removes the model's ability to produce it. The harmful capacity itself is gone — not blocked, not caught, not redirected. Gone.

These constraints are enforced at the hardware level. They can't be bypassed, trained away, prompted around, or removed by the model itself. The model stays fully capable on everything it should be doing — safe output is completely unaffected. Only harmful capacity is touched.

The result is verifiable. Auditable. Provable to regulators, customers, and stakeholders. It works even on fully jailbroken models.