6 Dimensions to Cut Through the...

Hello CyberBuilders 🖖

Let’s talk AI security (don’t stop reading here, give me 2 mins!)

It’s the new obsession in the cybersecurity startup world. In the past two years, funding has poured into companies claiming to protect, secure, or safeguard AI. Articles go into every flashy point exploit, then pitch simplistic one-size-fits-all fixes. But while the buzz is deafening, the adoption? Not so much.

AI security is neither trivial nor transient.

It’s not “detect and alert” for today’s headline-grabbing prompt-injection demo; it’s about building an end-to-end defense that survives tomorrow’s model update.

First, recognize the difference between short-lived threats and long-standing risk families. Yes, new jailbreaks and chain-of-thought exploits make headlines, but many of those exploits will be patched—or rendered obsolete—within weeks. Effective strategies focus on the threat categories and the architectures that persist even as attack techniques change.

Second, understand your environment and your audience. Securing a public cloud deployment isn’t the same as protecting an on-prem GPU cluster. The controls you bake into a model-integration pipeline for a DevSecOps team will differ from the governance dashboards a CISO needs or the self-service guardrails that data scientists demand.

That’s why I’ve written this post: not to sell a silver bullet, but to share a navigational map.

What follows is a structured view of what I believe are the six key dimensions of AI security today. From prompt-level filtering to cloud agent lock-in, from open-source defenders to enterprise governance platforms—I’ll walk through each theme with examples, risks, and strategic implications.

With that in mind, we’ll frame AI security across 6 interconnected dimensions:

AI Security Paradox: How to distinguish enduring risk families from short-lived exploits—and why you should chase the former.
Grassroots Defenders: Open-source rule engines (YARA-style prompt rules, Snort-like detectors) as Phase 1 triage.
Enterprise Governance: Model-usage inventories, data classification, and policy enforcement from vendors like AIM, Prompt Security, and Lakera.
DevOps & MLOps Integration: Shift-left practices—adversarial prompt simulations, automated red-teaming, runtime filters—to bake in security early.
Secure-by-Design from AI Labs: Guardrails and interpretability initiatives from OpenAI, Anthropic, Entropic, and Meta’s LLaMA protections.
Agent-Based Cloud Platforms: Microsoft Copilot Studio, Google AgentSpace, and the risks of vendor-hosted code and limited third-party integrations.

There are no vendor heroics here—just a holistic understanding to help you see who’s doing what, where the gaps lie…

“Everyone agrees AI security isn’t ready for primetime – developers are focused on building, with security being a best effort for now.”

Today’s AI security landscape is hard. Language models are unpredictable and exhibit emergent behavior. The attack surface now includes every token in a prompt.

OWASP put Prompt Injection at the top of its LLM Top 10 list in 2023—proof that even foundational threats lack mature, robust countermeasures. Yet the real danger is in “last-mile” failures. In early 2025, security researchers red-teamed DeepSeek’s R1 model, achieving a 100 percent bypass rate on its safety filters—and extracting toxic content with every test (see WIRED and Palo Alto Unit42 blog)

This is the AI-Security Paradox: We’ve built cybersecurity systems capable of incredible feats, yet we remain inefficient in the new world of AI; moreover, we are rushing to build old-minded defenses when AI is moving so fast that they will be obsolete in a few weeks.

What is a real threat when the model providers are keeping their prompts updated on a daily basis? It is fascinating to look at these “text files” that govern the AI “security & safety” delivered by LLM vendors.

It is so funny to see the complexity of these systems prompts. Models are trained (with SFT, RL, etc.) but also patched by LLM vendors. For example, Anthropic had to add:

Facts like “Donald Trump defeated Kamala Harris in the 2024 elections.”
Lawful rules like “Always strictly respect copyright and follow the by NEVER reproducing more than 20 words of text from original web sources or outputting displacive summaries. Instead, only ever use 1 quote of UNDER 20 words long within quotation marks.”
Harmful protection like “Avoid creating search queries that produce texts from known extremist organizations or their members (e.g. the 88 Precepts). If harmful sources are in search results, do not use these harmful sources and refuse requests to use them, to avoid inciting hatred, facilitating access to harmful information, or promoting harm, and to uphold Claude’s ethical commitments.”

Model makers constantly patch existing models with these prompts, changing their behavior. They are also creating new technologies, models, and agents (tools like MCP, short and long memory, multi-step agent).

AI Security is a moving target!

As the vendors arrived, the community took action.

Open-source builders were among the first to respond to the rise of LLM threats, crafting lightweight, rule-based tools to triage risky prompts and outputs. Think YARA for language, Snort for semantics. These tools are fast, flexible, and built to catch the obvious before it escalates.

One standout is NOVA, an open-source prompt pattern-matching engine inspired by YARA. Kudos to Thomas Roccia!

It combines keyword detection, semantic similarity, and LLM-based evaluation to analyze and detect prompt content. NOVA allows users to write custom rules tailored to their needs, whether detecting sensitive data, blocking prompt injections, or filtering malicious inputs.

As generative AI becomes integral to enterprise operations, the need for robust governance frameworks has never been more critical. Organizations would ensure that AI systems are secure, compliant, and aligned with corporate policies.

Several vendors have emerged to address these challenges:

Lakera Guard: Offers real-time security for AI applications, focusing on prompt injection prevention, data leakage protection, and red teaming simulations.
AIM Security: Provides a GenAI security platform that includes model usage inventories, data classification, and policy enforcement workflows.
Prompt Security: Delivers comprehensive insights and governance capabilities for AI tools, ensuring operational transparency and security.

CalypsoAI: Specializes in governance of generative AI models, offering security scanners and compliance management tools.
Lasso Security: Focuses on contextual data protection with real-time detection and alerting systems.
WhyLabs: Provides continuous monitoring and real-time security guardrails for AI applications.
Protect AI: Offers AI Security Posture Management with enterprise-level scanning and end-to-end security monitoring. Protect AI does also released LLM Guard, with features like built-in sanitization and redaction, data leakage prevention, and prompt injection resistance.

On of the latest companies in the game is Pilar Security (https://www.pillar.security).

These solutions are designed to provide enterprises with the tools necessary to maintain governance, auditability, and compliance in the rapidly evolving AI landscape.

These tools are delivering important visibility and governance features… but are the CISO and their security teams are ready to use them?

Based on my conversations, this is not really the case yet. And I am not the only one to report this.

Look at James Berthoty, RSAC 2025’s takeaway:

Or Cole Grolmus post on ProtectAI acquisition by Palo Alto:

AI/ML Security is the definition of an embryonic market
….
Right now, the tailwinds for the AI/ML security market are looking exponentially better than they did during our “LLMs aren’t scaling” spell in Q4 2024.

Back then, I would have said we were only going to have three to five third party models that everyone was going to use (OpenAI, Anthropic, etc.). In that scenario, there is basically no need for AI/ML security (for the models themselves, I mean) outside the handful of companies building the models.

After DeepSeek (and other open source models), it seems a lot more likely that companies with enough motivation and resources are going to both host and post-train third-party models. Ambitious companies will even train their own first-party models.

OK very valid points… but the AI Security market is so much more than just governing and getting visibility of enterprise data going out of the corporate perimeter. It must be managed by:

DevOps and ML Ops, when they will train the “very own” model of the company
The big AI Labs themselves such as OpenAI, Anthropic, Mistral AI or Google

If AI security is to move beyond dashboards and policy slides, it needs to be wired directly into the development pipeline.

“Shift-left” mindset is where . The earlier you test, simulate, and adapt to failure modes, the faster you can iterate safely. In the AI context, that means proactively stress-testing models and inference pipelines before the first user prompt hits production.

Red-teaming isn’t just for the big labs anymore. Teams are now building automated adversarial prompt generators—simulating injections, jailbreaks, and toxic responses—to validate model behavior at test time.

Want to catch sensitive leakage before it hits production?
→ Run structured attacks against your fine-tuned models.
→ Build assertions for unacceptable outputs.
→ Use synthetic abuse prompts as part of your CI/CD.

It’s not perfect, but it’s a start—and it’s programmable.

Even with testing, things slip through. That’s why more teams are implementing runtime defenses that wrap around LLMs like middleware:

Real-time filters on input/output flows
Anomaly detection based on behavior deviation (e.g., new prompt patterns or sudden topic shifts)
Inline policy enforcement tuned to business context

Some startups are offering API-level guards and tamper detection mechanisms, but many companies are also rolling their own wrappers. Especially when they fine-tune their own models—or host them internally—they need runtime “filters” that can scale.

Finally, do we really need all these solutions or can we wait AI model builders to take care of it ?

When it comes to AI security, the major labs aren’t just participants—they’re setting the rules of the game. Let’s examine how OpenAI, Anthropic, Meta, and Entropic are embedding security into their models from the ground up.

OpenAI has released its Model Spec, a comprehensive document outlining desired model behaviors. This specification emphasizes:

Customizability: Allowing developers to tailor AI behavior to specific needs.
Transparency: Clearly defining how models should act in various scenarios.
Safety: Implementing guardrails to prevent harmful outputs.

The Model Spec is open-sourced under a Creative Commons license, inviting community collaboration and adaptation.

Anthropic is prioritizing interpretability, aiming to make AI systems more understandable and controllable. CEO Dario Amodei’s essay, The Urgency of Interpretability, sets a goal: by 2027, interpretability should reliably detect most model problems. Their approach includes:

Meta has introduced a suite of tools under its LLaMA Protections initiative, including:

Llama Guard 4: A content moderation model for real-time filtering.
LlamaFirewall: Orchestrates multiple protection tools within AI applications.
Prompt Guard 2: Prevents jailbreaks and prompt injections.

These tools are designed to help developers build secure AI systems with built-in safeguards.

Last but not least. Agentic AI is likely to change the security paradigm.

AI agents are rapidly becoming the new security perimeter. With platforms like Microsoft Azure Copilot and Google AgentSpace and Google Duet AI, enterprises are embedding intelligent assistants directly into their cloud environments.

Microsoft’s Azure Copilot integrates AI agents into the Azure ecosystem, offering features like:

Proactive Governance: Admins can utilize the Power Platform admin center to manage agent adoption and create controlled environments for development.
Secure by Default: Copilot adheres to existing access management protocols, including Azure role-based access control (RBAC), ensuring that agents operate within defined permissions.
Comprehensive Visibility: With Microsoft Entra Agent ID, organizations gain visibility and control over AI agents, addressing challenges like agent sprawl and non-human identity management.

Google’s Duet AI offers AI-powered assistants across Google Workspace and Google Cloud, emphasizing:

Agent-to-Agent (A2A) Protocol: This protocol enables multiple AI agents to communicate and collaborate, allowing for modular and scalable AI systems.
Enterprise-Grade Security: Duet AI incorporates features like VPC Service Controls and granular IAM permissions, ensuring secure deployment within enterprise environments.

Here’s the concern: if everything runs inside the vendor’s cloud, your ability to hook into the infrastructure is almost zero.

You don’t get root access. You don’t get telemetry beyond what they allow. You don’t get to wrap the agent with your own guardrails.

It’s the mobile app model, all over again.

In mobile ecosystems, developers and security teams have limited visibility. You can scan for malicious apps, maybe get usage logs, but you can’t inject deep security controls into the OS or the app runtime.

I am afraid that’s what the big AI platforms are now building: closed ecosystems with tightly controlled agent runtimes.

And the economics? They favor closure. The more Microsoft or Google can keep agents running in their own clouds, under their own policies, with billing wrapped into their infra, the stronger their business model.

If the market doesn’t demand open, auditable, composable AI infrastructure, it won’t be delivered.

AI security doesn’t happen in a vacuum. How—and where—you deploy your models fundamentally shapes your threat model.

API Only LLMs offer scale and convenience but introduce risks around vendor lock-in, opaque telemetry, and limited guardrail access.
Cloud-based LLMs – the model you can deploy in your very own AWS, GCP, or Scaleway instances – improve the model by giving you a sense of control: as the LLM is running in your “own” cloud environment, it is unlikely the LLM provider will access your proprietary data. Microsoft Azure has seen the OpenAI GPT-4 model explode in usage.
On-premises deployments give you more control but demand more profound expertise and often recreate complexity that hyperscalers have already abstracted away. If not tightly managed, hybrid setups bring the worst of both: fragmented policy enforcement, uneven visibility, and unclear ownership of responsibility.

When it comes to AI Security:

Laurent Hausermann

The tools and controls that work for a DevSecOps engineer monitoring runtime behavior won’t help a CISO navigate governance risk. A security architect wants composability and observability; a software engineer just needs guardrails that don’t block shipping. Data scientists, rarely security-first, now need prompts scrubbed, outputs audited, and their tools monitored.

So where do we go from here?

After reviewing threat lifecycles, deployment architectures, open-source tools, governance vendors, DevSecOps practices, foundation-model initiatives, and agent-based platforms—AI security boils down to two things: your infrastructure, and your personas.

Ask yourself:

Where are your models running—cloud, on-prem, hybrid?
Who are you securing for—CISO, security architect, developer, or data scientist?

Happy to get all your comments in public or private as always 👇

Laurent 💚