Securing LLMs & AI: Lessons from the Frontlines of MLSecOps

When your organization ships an LLM-powered feature, the security questions don't change — threat modeling, input validation, access control, audit logging. What changes is that the answers don't map cleanly onto any of the mental models we built for traditional software security.

Traditional software is deterministic. Given the same input, you get the same output. You can write a test. You can validate the behavior. You can reason about edge cases. AI systems aren't deterministic — the same prompt can produce different outputs, and the attack surface shifts in ways that static analysis doesn't capture.

This is the core challenge of MLSecOps: applying the discipline of DevSecOps to systems that behave fundamentally differently from traditional software.

What Makes AI/ML Systems Different to Secure

Three properties of AI systems break assumptions that traditional security tooling relies on:

Non-determinism

You can't write a fixed test case for an LLM's security behavior the way you'd write a test for a SQL injection filter. The model may handle a malicious prompt safely 99% of the time and fail on the 100th variation. Non-determinism means security validation has to be probabilistic and continuous, not a one-time gate.

Agentic behavior

AI agents that can take actions — send emails, query APIs, write files, execute code — have a blast radius that scales with their capabilities. A compromised agent isn't just a data leak risk; it's a potential pivot point for everything the agent can do. Most security models treat AI as a passive component that generates text. Agentic systems break that assumption entirely.

Opaque internals

Traditional application security includes code review. You can read the code and understand what it does. With LLMs, the "code" is billions of parameters, and the behavior emerges from training data and fine-tuning that you may not have full visibility into. This changes both the threat model and the tooling you need.

The New Attack Surface

AI systems introduce threat vectors that don't exist in traditional software stacks:

Prompt Injection

The LLM equivalent of SQL injection. A malicious payload in user input — or in data the model retrieves — manipulates the model's behavior to bypass safety guardrails, exfiltrate context, or execute unintended actions. Unlike SQL injection, there's no parameterized query equivalent. Defense requires both input sanitization and output monitoring.

Training Data Poisoning

If you're fine-tuning models on user-provided data or third-party datasets, a poisoning attack can embed backdoor behaviors that are triggered by specific inputs. This is a supply chain risk: the attack happens before the model is deployed, and detection requires understanding what your training data actually contains.

Model Integrity

The model artifact itself is a supply chain component. A compromised model — whether from a tampered download, a malicious fine-tuning service, or a backdoored base model — can behave normally under most conditions and maliciously under specific trigger inputs. Model signing and hash verification aren't optional in production environments.

Context Window Exfiltration

In RAG (retrieval-augmented generation) systems, the context window may contain sensitive documents, user data, or system information. Prompt injection attacks specifically targeting context window contents can exfiltrate this data through the model's output in ways that aren't obviously visible in standard monitoring.

MLSecOps in Practice

The good news is that the DevSecOps mental model translates to AI/ML — shift left, automate security controls, monitor continuously. What changes is the implementation.

Threat model before you build

Standard threat modeling — STRIDE or equivalent — applies. But the specific threats are different. For each AI component: what inputs does it accept? What actions can it take? What data does it have access to? What happens if it's manipulated? Answer these before the component is in production, not after.

Input validation at the boundary

You can't sanitize a prompt the same way you sanitize SQL inputs, but you can apply layered validation: length limits, character filtering, semantic analysis for known injection patterns, and classification of user intent before passing to the model. This doesn't eliminate prompt injection risk — it raises the cost of exploitation.

Output monitoring

Monitor what the model actually produces, not just what it receives. Content classifiers, PII detection, and anomaly detection on output patterns can catch cases where the model is behaving unexpectedly — whether from prompt injection, model drift, or deliberate manipulation.

Model validation in CI/CD

Treat model updates like code deployments: hash verification, adversarial prompt testing suites, behavioral regression testing before promotion to production. The tooling is less mature than standard security testing frameworks, but the principle is identical.

Audit logging for agentic systems

Every action an agent takes — every API call, every file write, every message sent — should be logged with enough context to reconstruct what happened and why. This is non-negotiable. Without it, incident investigation in an agentic system is guesswork.

Securing AI systems doesn't require new principles. It requires applying the same principles you'd apply to any complex, internet-facing system — with the honest acknowledgment that the failure modes are different and the tooling is less mature.

What's Actually Hard

The honest answer about where MLSecOps is difficult:

Detection coverage is immature. The tooling for detecting model abuse in production is still early. Most organizations are building custom monitoring rather than buying it.
The threat landscape moves fast. New prompt injection techniques and attack patterns emerge regularly. Keeping detection current requires active engagement with the research community.
The expertise gap is real. Security teams that understand ML systems and ML teams that understand security are both rare. The intersection is smaller still. MLSecOps requires building bridges between those disciplines.
Standards are evolving. OWASP's LLM Top 10, NIST's AI Risk Management Framework, and similar guidance are valuable but incomplete. You'll need to make judgment calls that frameworks don't yet cover.

None of this is a reason to avoid shipping AI features. It's a reason to treat AI components as first-class security concerns rather than afterthoughts — and to start building the expertise before you need it under pressure.

Securing LLMs & AI: Lessons from
the Frontlines of MLSecOps

What Makes AI/ML Systems Different to Secure

Non-determinism

Agentic behavior

Opaque internals

The New Attack Surface

Prompt Injection

Training Data Poisoning

Model Integrity

Context Window Exfiltration

MLSecOps in Practice

Threat model before you build

Input validation at the boundary

Output monitoring

Model validation in CI/CD

Audit logging for agentic systems

What's Actually Hard

Santiago Friquet

Read Next

Detection ≠ Alerts | Detection = Knowledge

Detection & Response in Cloud Environments: Zero to ETL

Get New Articles Delivered

Securing LLMs & AI: Lessons fromthe Frontlines of MLSecOps

What Makes AI/ML Systems Different to Secure

Non-determinism

Agentic behavior

Opaque internals

The New Attack Surface

Prompt Injection

Training Data Poisoning

Model Integrity

Context Window Exfiltration

MLSecOps in Practice

Threat model before you build

Input validation at the boundary

Output monitoring

Model validation in CI/CD

Audit logging for agentic systems

What's Actually Hard

Santiago Friquet

Read Next

Detection ≠ Alerts | Detection = Knowledge

Detection & Response in Cloud Environments: Zero to ETL

Get New Articles Delivered

Securing LLMs & AI: Lessons from
the Frontlines of MLSecOps