Every existing approach to AI safety works at the software level — training, guardrails, content filters, system prompts. Compass works at a fundamentally different layer.
It starts with interpretability. We read how a model internally represents safe and harmful behavior — not guessing from outputs, but looking directly at the model's internal representations. From that understanding, we identify what to constrain.
Then we remove the model's ability to produce harmful content. Not by filtering output after the fact, but by removing the harmful capacity from the model itself. The model doesn't refuse. The ability is simply no longer available to it.
Those constraints are enforced by hardware. The model cannot access them, modify them, or circumvent them. The only way to remove them is physical access to the hardware itself.
What matters is what doesn't change. The model stays fully capable on everything it should be doing. Safe output is completely unaffected. It trains normally, improves freely, and runs at full speed — but it can never modify its own safety constraints, because they live in hardware it doesn't have access to.
The result is auditable, provable, and regulation-ready. It works even on models that have been fully jailbroken — alignment surgically removed, generating harmful content freely — and makes them safe again.
We've filed two patents covering 118 pages of technical specification for the full system. The geometric harm filtering at the core of Compass has been empirically validated across multiple models — and we've used it to restore safety to fully jailbroken models through geometric filtering alone.
There's a working demo. Side by side, the same jailbroken model: without Compass it produces dangerous output, with Compass it's identical on safe prompts and incoherent on harmful ones.
The core hardware component has been confirmed feasible — implementable in weeks on commodity FPGA hardware. The science works. Now we're building the product.