Research & Writing
Ideas from the
safety frontier.
Technical research, threat analysis, and field notes from our work building foundational AI safety tooling.
010203
analysis
When Guardrails Fail: What Claude Opus 4.6 Reveals About Prompt Injection Risk
Anthropic's Claude Opus 4.6 system card finally quantifies prompt injection risk at scale. These numbers should reshape how enterprises deploy AI agents.
research
Hidden in Plain Language: How Calendar Invites Became Data Extraction Tools Through Prompt Injection
A calendar event with crafted instructions could silently extract your private meeting data when you ask Gemini about your schedule. This reveals fundamental gaps in how AI systems handle untrusted inputs.
analysis
When AI Democratization Meets Vulnerability: The Real Cost of No-Code AI Agents
No-code AI platforms promise accessibility. Recent research shows they also introduce security challenges traditional approaches don't address.