Technology — Safer Intelligence Labs

Every existing approach to AI safety works at the software level—training, guardrails, content filters, system prompts. Compass works at a fundamentally different layer.

Using modern interpretability methods, we identify the components inside a model that drive harmful behavior, and study them. From that understanding, we develop constraints and interventions that prevent the model from producing harmful content—not by filtering output after the fact, but by removing the harmful capacity from the model itself. The model doesn't refuse. The ability is simply no longer available to it.

Those constraints are enforced by hardware. The model cannot access them, modify them, or circumvent them. The only way to remove them is physical access to the hardware itself.

But Compass goes further than harm removal. It takes any existing safety constraint—RLHF, constitutional AI, system prompts, custom behavioral rules—and makes it immutable. The model stays fully capable on everything it should be doing. Safe output is completely unaffected. It trains normally, improves freely, and runs at full speed—but it can never modify its own safety constraints, because they live in hardware it doesn't have access to.

The result is auditable, provable, and regulation-ready. It works even on models that have been fully jailbroken—alignment surgically removed, generating harmful content freely—and makes them safe again.

We've filed two patents covering over 100 pages of technical specification for the full system. The novel safety mechanism at the core of Compass has been empirically validated across multiple architectures—and through this mechanism alone we've restored safety to multiple jailbroken models.

There's a working demo. Side by side, the same jailbroken model: without Compass it produces dangerous output, with Compass it's identical on safe prompts and incoherent on harmful ones.

The core hardware component has been confirmed feasible. The science works. Now we're raising our pre-seed round to build the product.

Where we are