The Hidden Costs Of AI: Securing Inference In An Age Of Attacks

This article is part of VentureBeat’s special issue, “The Real Cost of AI: Performance, Efficiency and ROI at Scale.” Read more from this special issue.

AI’s promise is undeniable, but so are its blindsiding security costs at the inference layer. New attacks targeting AI’s operational side are quietly inflating budgets, jeopardizing regulatory compliance and eroding customer trust, all of which threaten the return on investment (ROI) and total cost of ownership of enterprise AI deployments.

AI has captivated the enterprise with its potential for game-changing insights and efficiency gains. Yet, as organizations rush to operationalize their models, a sobering reality is emerging: The inference stage, where AI translates investment into real-time business value, is under siege. This critical juncture is driving up the total cost of ownership (TCO) in ways that initial business cases failed to predict.

Security executives and CFOs who greenlit AI projects for their transformative upside are now grappling with the hidden expenses of defending these systems. Adversaries have discovered that inference is where AI “comes alive” for a business, and it’s precisely where they can inflict the most damage. The result is a cascade of cost inflation: Breach containment can exceed $5 million per incident in regulated sectors, compliance retrofits run into the hundreds of thousands and trust failures can trigger stock hits or contract cancellations that decimate projected AI ROI. Without cost containment at inference, AI becomes an ungovernable budget wildcard.

The unseen battlefield: AI inference and exploding TCO

AI inference is rapidly becoming the “next insider risk,” Cristian Rodriguez, field CTO for the Americas at CrowdStrike, told the audience at RSAC 2025.

Other technology leaders echo this perspective and see a common blind spot in enterprise strategy. Vineet Arora, CTO at WinWire, notes that many organizations “focus intensely on securing the infrastructure around AI while inadvertently sidelining inference.” This oversight, he explains, “leads to underestimated costs for continuous monitoring systems, real-time threat analysis and rapid patching mechanisms.”

Another critical blind spot, according to Steffen Schreier, SVP of product and portfolio at Telesign, is “the assumption that third-party models are thoroughly vetted and inherently safe to deploy.”

He warned that in reality, “these models often haven’t been evaluated against an organization’s specific threat landscape or compliance needs,” which can lead to harmful or non-compliant outputs that erode brand trust. Schreier told VentureBeat that “inference-time vulnerabilities — like prompt injection, output manipulation or context leakage — can be exploited by attackers to produce harmful, biased or non-compliant outputs. This poses serious risks, especially in regulated industries, and can quickly erode brand trust.”

When inference is compromised, the fallout hits multiple fronts of TCO. Cybersecurity budgets spiral, regulatory compliance is jeopardized and customer trust erodes. Executive sentiment reflects this growing concern. In CrowdStrike’s State of AI in Cybersecurity survey, only 39% of respondents felt generative AI’s rewards clearly outweigh the risks, while 40% judged them comparable. This ambivalence underscores a critical finding: Safety and privacy controls have become top requirements for new gen AI initiatives, with a striking 90% of organizations now implementing or developing policies to govern AI adoption. The top concerns are no longer abstract; 26% cite sensitive data exposure and 25% fear adversarial attacks as key risks.

Security leaders exhibit mixed sentiments regarding the overall safety of gen AI, with top concerns centered on the exposure of sensitive data to LLMs (26%) and adversarial attacks on AI tools (25%).

Anatomy of an inference attack

The unique attack surface exposed by running AI models is being aggressively probed by adversaries. To defend against this, Schreier advises, “it is critical to treat every input as a potential hostile attack.” Frameworks like the OWASP Top 10 for Large Language Model (LLM) Applications catalogue these threats, which are no longer theoretical but active attack vectors impacting the enterprise:

Prompt injection (LLM01) and insecure output handling (LLM02): Attackers manipulate models via inputs or outputs. Malicious inputs can cause the model to ignore instructions or divulge proprietary code. Insecure output handling occurs when an application blindly trusts AI responses, allowing attackers to inject malicious scripts into downstream systems.

Training data poisoning (LLM03) and model poisoning: Attackers corrupt training data by sneaking in tainted samples, planting hidden triggers. Later, an innocuous input can unleash malicious outputs.

Model denial of service (LLM04): Adversaries can overwhelm AI models with complex inputs, consuming excessive resources to slow or crash them, resulting in direct revenue loss.

Supply chain and plugin vulnerabilities (LLM05 and LLM07): The AI ecosystem is built on shared components. For instance, a vulnerability in the Flowise LLM tool exposed private AI dashboards and sensitive data, including GitHub tokens and OpenAI API keys, on 438 servers.

Sensitive information disclosure (LLM06): Clever querying can extract confidential information from an AI model if it was part of its training data or is present in the current context.

Excessive agency (LLM08) and Overreliance (LLM09): Granting an AI agent unchecked permissions to execute trades or modify databases is a recipe for disaster if manipulated.

Model theft (LLM10): An organization’s proprietary models can be stolen through sophisticated extraction techniques — a direct assault on its competitive advantage.

Underpinning these threats are foundational security failures. Adversaries often log in with leaked credentials. In early 2024, 35% of cloud intrusions involved valid user credentials, and new, unattributed cloud attack attempts spiked 26%, according to the CrowdStrike 2025 Global Threat Report. A deepfake campaign resulted in a fraudulent $25.6 million transfer, while AI-generated phishing emails have demonstrated a 54% click-through rate, more than four times higher than those written by humans.

The OWASP framework illustrates how various LLM attack vectors target different components of an AI application, from prompt injection at the user interface to data poisoning in the training models and sensitive information disclosure from the datastore.

Back to basics: Foundational security for a new era

Securing AI requires a disciplined return to security fundamentals — but applied through a modern lens. “I think that we need to take a step back and ensure that the foundation and the fundamentals of security are still applicable,” Rodriguez argued. “The same approach you would have to securing an OS is the same approach you would have to securing that AI model.”

This means enforcing unified protection across every attack path, with rigorous data governance, robust cloud security posture management (CSPM), and identity-first security through cloud infrastructure entitlement management (CIEM) to lock down the cloud environments where most AI workloads reside. As identity becomes the new perimeter, AI systems must be governed with the same strict access controls and runtime protections as any other business-critical cloud asset.

The specter of “shadow AI”: Unmasking hidden risks

Shadow AI, or the unsanctioned use of AI tools by employees, creates a massive, unknown attack surface. A financial analyst using a free online LLM for confidential documents can inadvertently leak proprietary data. As Rodriguez warned, queries to public models can “become another’s answers.” Addressing this requires a combination of clear policy, employee education, and technical controls like AI security posture management (AI-SPM) to discover and assess all AI assets, sanctioned or not.

Fortifying the future: Actionable defense strategies

While adversaries have weaponized AI, the tide is beginning to turn. As Mike Riemer, Field CISO at Ivanti, observes, defenders are beginning to “harness the full potential of AI for cybersecurity purposes to analyze vast amounts of data collected from diverse systems.” This proactive stance is essential for building a robust defense, which requires several key strategies:

Budget for inference security from day zero: The first step, according to Arora, is to begin with “a comprehensive risk-based assessment.” He advises mapping the entire inference pipeline to identify every data flow and vulnerability. “By linking these risks to possible financial impacts,” he explains, “we can better quantify the cost of a security breach” and build a realistic budget.

To structure this more systematically, CISOs and CFOs should start with a risk-adjusted ROI model. One approach:

Security ROI = (estimated breach cost × annual risk probability) – total security investment

For example, if an LLM inference attack could result in a $5 million loss and the likelihood is 10%, the expected loss is $500,000. A $350,000 investment in inference-stage defenses would yield a net gain of $150,000 in avoided risk. This model enables scenario-based budgeting tied directly to financial outcomes.

Enterprises allocating less than 8 to 12% of their AI project budgets to inference-stage security are often blindsided later by breach recovery and compliance costs. A Fortune 500 healthcare provider CIO, interviewed by VentureBeat and requesting anonymity, now allocates 15% of their total gen AI budget to post-training risk management, including runtime monitoring, AI-SPM platforms and compliance audits. A practical budgeting model should allocate across four cost centers: runtime monitoring (35%), adversarial simulation (25%), compliance tooling (20%) and user behavior analytics (20%).

Here’s a sample allocation snapshot for a $2 million enterprise AI deployment based on VentureBeat’s ongoing interviews with CFOs, CIOs and CISOs actively budgeting to support AI projects:

Budget categoryAllocationUse case exampleRuntime monitoring$300,000Behavioral anomaly detection (API spikes)Adversarial simulation$200,000Red team exercises to probe prompt injectionCompliance tooling$150,000EU AI Act alignment, SOC 2 inference validationsUser behavior analytics$150,000Detect misuse patterns in internal AI use

These investments reduce downstream breach remediation costs, regulatory penalties and SLA violations, all helping to stabilize AI TCO.

Implement runtime monitoring and validation: Begin by tuning anomaly detection to detect behaviors at the inference layer, such as abnormal API call patterns, output entropy shifts or query frequency spikes. Vendors like DataDome and Telesign now offer real-time behavioral analytics tailored to gen AI misuse signatures.

Teams should monitor entropy shifts in outputs, track token irregularities in model responses and watch for atypical frequency in queries from privileged accounts. Effective setups include streaming logs into SIEM tools (such as Splunk or Datadog) with tailored gen AI parsers and establishing real-time alert thresholds for deviations from model baselines.

Adopt a zero-trust framework for AI: Zero-trust is non-negotiable for AI environments. It operates on the principle of “never trust, always verify.” By adopting this architecture, Riemer notes, organizations can ensure that “only authenticated users and devices gain access to sensitive data and applications, regardless of their physical location.”

Inference-time zero-trust should be enforced at multiple layers:

Identity: Authenticate both human and service actors accessing inference endpoints.

Permissions: Scope LLM access using role-based access control (RBAC) with time-boxed privileges.

Segmentation: Isolate inference microservices with service mesh policies and enforce least-privilege defaults through cloud workload protection platforms (CWPPs).

A proactive AI security strategy requires a holistic approach, encompassing visibility and supply chain security during development, securing infrastructure and data and implementing robust safeguards to protect AI systems in runtime during production.

Protecting AI ROI: A CISO/CFO collaboration model

Protecting the ROI of enterprise AI requires actively modeling the financial upside of security. Start with a baseline ROI projection, then layer in cost-avoidance scenarios for each security control. Mapping cybersecurity investments to avoided costs including incident remediation, SLA violations and customer churn, turns risk reduction into a measurable ROI gain.

Enterprises should model three ROI scenarios that include baseline, with security investment and post-breach recovery to show cost avoidance clearly. For example, a telecom deploying output validation prevented 12,000-plus misrouted queries per month, saving $6.3 million annually in SLA penalties and call center volume. Tie investments to avoided costs across breach remediation, SLA non-compliance, brand impact and customer churn to build a defensible ROI argument to CFOs.

Checklist: CFO-Grade ROI protection model

CFOs need to communicate with clarity on how security spending protects the bottom line. To safeguard AI ROI at the inference layer, security investments must be modeled like any other strategic capital allocation: With direct links to TCO, risk mitigation and revenue preservation.

Use this checklist to make AI security investments defensible in the boardroom — and actionable in the budget cycle.

Link every AI security spend to a projected TCO reduction category (compliance, breach remediation, SLA stability).

Run cost-avoidance simulations with 3-year horizon scenarios: baseline, protected and breach-reactive.

Quantify financial risk from SLA violations, regulatory fines, brand trust erosion and customer churn.

Co-model inference-layer security budgets with both CISOs and CFOs to break organizational silos.

Present security investments as growth enablers, not overhead, showing how they stabilize AI infrastructure for sustained value capture.

This model doesn’t just defend AI investments; it defends budgets and brands and can protect and grow boardroom credibility.

Concluding analysis: A strategic imperative

CISOs must present AI risk management as a business enabler, quantified in terms of ROI protection, brand trust preservation and regulatory stability. As AI inference moves deeper into revenue workflows, protecting it isn’t a cost center; it’s the control plane for AI’s financial sustainability. Strategic security investments at the infrastructure layer must be justified with financial metrics that CFOs can act on.

The path forward requires organizations to balance investment in AI innovation with an equal investment in its protection. This necessitates a new level of strategic alignment. As Ivanti CIO Robert Grazioli told VentureBeat: “CISO and CIO alignment will be critical to effectively safeguard modern businesses.” This collaboration is essential to break down the data and budget silos that undermine security, allowing organizations to manage the true cost of AI and turn a high-risk gamble into a sustainable, high-ROI engine of growth.

Telesign’s Schreier added: “We view AI inference risks through the lens of digital identity and trust. We embed security across the full lifecycle of our AI tools — using access controls, usage monitoring, rate limiting and behavioral analytics to detect misuse and protect both our customers and their end users from emerging threats.”

He continued: “We approach output validation as a critical layer of our AI security architecture, particularly because many inference-time risks don’t stem from how a model is trained, but how it behaves in the wild.”

Source link

What's Hot

MIT’s “stealth” immune cells could change cancer treatment forever

Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency – Takara TLDR

The fixer’s dilemma: Chris Lehane and OpenAI’s impossible mission

The Hidden Costs of AI: Securing Inference in an Age of Attacks

Nvidia researchers boost LLMs reasoning skills by getting them to 'think' during pre-training

Together AI's ATLAS adaptive speculator delivers 400% inference speedup by learning from workloads in real-time

What MIT got wrong about AI agents: New G2 data shows they’re already driving enterprise ROI

The Rubin Names 2025 Art Prize, Research and Art Projects Grants

Frieze to Launch Abu Dhabi Fair in November 2026

Jeff Koons Returns to Gagosian with First New York Show in Seven Years

Ancient Egyptian Iconography Found in Roman-Era Bathhouse in Turkey