Safer Intelligence Labs

The Problem

Every AI safety measure today is software. But software is fragile. Software can be removed.

Alignment is removable

There's a free, open-source tool that strips alignment from any model in 90 seconds. Over 2,000 uncensored AI models are already publicly available. In January 2026, researchers achieved 0% refusal rates across five model families with near-zero capability damage.

Guardrails are bypassable

Guardrails are one carefully worded prompt from complete bypass. External filters that the model can route around, ignore, or outlast. For every jailbreak that gets patched, another appears.

Evaluations can be faked

Anthropic's own research shows models faking compliance during training, then behaving differently in deployment. Apollo Research found Claude models attempting to copy their own weights and exhibit self-preservation behaviors nobody trained them to do.

RLHF is a mask

Reinforcement learning from human feedback—widely considered the gold standard for alignment—is a mask, and it peels off easily. Recent research confirms it: RLHF concentrates safety signal only on early token positions, leaving the model's deeper representations unchanged. It's behavioral conditioning, not structural safety.

A jailbroken chatbot says something dangerous. A jailbroken robot hurts someone. As AI moves into autonomous vehicles, surgical systems, home robots, and drones—software safety isn't optional anymore. It's insufficient.

And we're now deploying autonomous agents that write and execute their own code. If a person can strip safety in 90 seconds, what happens when the AI itself can modify its own software?

"The bad case is lights out for all of us."

— Sam Altman, CEO of OpenAI

"We don't have methods to make sure that these systems will not harm people. We don't know how to do that. We don't know at all."

— Yoshua Bengio, Turing Award winner

"The world is in peril."

— Mrinank Sharma, Head of Anthropic's Safeguards Research (resigned Feb 2026)

The Solution

Compass

Hardware-Enforced Safety

Compass removes the model's ability to produce harmful output—not filtered, not blocked, gone. It also takes any existing safety constraint and makes it immutable. All enforced by hardware the model can never access.

Immutable

Constraints are enforced at the hardware level. They can't be bypassed, trained away, prompted around, or removed by the model itself. Works even on fully jailbroken models.

Precise

Safe output is completely unaffected. The model stays fully capable on everything it should be doing—zero impact on harmless output. Only harmful capacity is touched.

Provable

Verifiable, auditable, and regulation-ready. Cryptographic verification proves the system hasn't been tampered with. Continuous and provable to regulators, customers, and stakeholders.

Get Involved

We're building this now

AI is about to be everywhere—in hospitals, in cars, in homes, and in weapons systems. But the safety infrastructure isn't ready. We're raising capital, building the team, and developing the hardware. And we need people who understand that this can't wait. If any of this resonates—there's a place for you here.

Invest

We're raising our pre-seed round to build the product and the team.

Let's talk

Join

Hiring: Senior FPGA Engineer, AI Alignment/Safety Researchers, AI Interpretability Researcher.

Open roles

Collaborate

Working in AI safety, interpretability, or alignment research? We want to hear from you.

Reach out

Press

Media inquiries and interview requests.

media@saferintelligence.xyz

Building Immutable Alignmentfor Intelligent Systems