Artificial Intelligence

Agentic AI Reality Check: The Million-Dollar Mistake Hiding Inside ERP

Friday, June 19, 2026

Practical guidance for leaders deploying agents in ERP. In Agentic AI Reality CheckThe Million-Dollar Mistake Hiding Inside ERP Stacks, Avinash Tiwari explains how to deliver value with guardrails, evaluation, and measurable ROI.

There's a contradiction playing out in the world of enterprise software.

On one hand, Gartner predicts that over 40% of agentic AI projects will be canceled by 2027. On the other, nearly 90% of business leaders say AI is essential to staying competitive. And that disconnect shows up in day-to-day development work.

Teams are being asked to build systems that are both mission-critical and, increasingly, unreliable in production. It's not that AI doesn't work. It's that it's being applied in places it's not designed to operate.

You're seeing general-purpose language models being dropped into environments like ERP systems, finance, supply chain, HR, procurement and so on, where dependable accuracy is non-negotiable. And when those models inevitably fail because they don't have the right context, the symptoms are familiar:

Recommendations look right, but break downstream workflows
Integration steps quietly miss dependencies
Security concerns surface late
Endless prompt tuning never quite stabilizes
Demos impress, but the systems can't be trusted

For developers, it's not an "AI strategy" problem. It's a systems problem.

"Agentic AI" Isn't the Whole Story

The term "agentic AI" has become a bit of a catch-all to describe everything from simple automation scripts to complex multi-agent orchestration systems. That ambiguity is part of the problem.

Many production systems today combine useful components, such as retrieval, tool use or orchestration, and those approaches absolutely have value. But calling all of them "agentic" can hide what actually determines their success in enterprise environments. At a practical level, it's ultimately about whether it can operate reliably inside real workflows.

A useful way to think about it is this: Some systems generate answers, and some systems execute tasks. But enterprise systems must understand how tasks affect other tasks. That's the missing layer.

True agentic behavior (when it works in production) means a system can:

Observe the current state of an application
Understand how that state relates to business processes
Plan and execute multi-step actions
Adjust based on outcomes

But even that isn't enough. The real differentiator is whether the system understands the domain it operates in, including its terminology, dependencies, constraints and failure modes.

Where General-Purpose AI Breaks Down

Most large language models are trained to solve language problems. Things like summarization, question answering or reasoning over general knowledge.

But enterprise systems are not language problems. They're steeped in interdependent workflows, configuration-driven systems and permission-controlled environments. This is most obvious in ERP systems.

Take three scenarios that play out regularly in Oracle Fusion environments:

Oracle Fusion Financials: A finance team adjusts a cost accounting rule to reallocate overhead across two business units, a change that looks contained in the UI. In practice, it cascades into subledger accounting entries that no longer reconcile with the general ledger, shifts fixed asset depreciation calculations in Oracle Assets, and breaks intercompany elimination logic during period-end consolidation. Worst of all, none of it surfaces until the close is already in progress.
Oracle Fusion Procurement: A buyer updates a purchase order approval rule to accommodate a new supplier tier. The change silently bypasses a three-way match validation that was embedded downstream in accounts payable, allowing invoices to post without corresponding goods receipts. The discrepancy only becomes visible during an AP aging review, which is typically weeks after the fact.
Oracle Fusion HCM: An HR administrator modifies a compensation business process to add a new approval stage. The change inadvertently skips a compliance checkpoint required for SOX-designated roles, a constraint invisible to any model without specific knowledge of how that tenant's approval hierarchy and security profiles are configured.

In each case, a general-purpose model might analyze the change and generate a plausible summary. It might even suggest test cases. But without access to the specific Oracle Fusion instance, including its configurations, customizations and integration patterns, there's no way it can reliably answer the most important question: What will actually break?

This is where things start to fail. Without ERP instance-level context, enterprise AI is essentially operating blind. And when a system is designed to always produce an answer, it fills those gaps with assumptions.

In consumer applications, that might be acceptable. But in enterprise systems, it's where cost overruns, rework and failed implementations begin.

Here's Why Inference Accuracy Matters

Developers often rely on general benchmarks to evaluate models, but those benchmarks often don't reflect how systems behave under real-world constraints.

What matters in enterprise environments is much narrower: Inference accuracy within a specific domain. And that means measuring how often a system produces the correct outcome for tasks like configuration impact analysis, integration validation or financial reconciliation logic.

But there's an important distinction to note here. A model can generate text that sounds correct. That's plausibility. But enterprise systems require outputs that are operationally correct. That's precision. And those two things diverge quickly in complex systems.

The takeaway? A model that performs well on public benchmarks can still fail repeatedly when applied to enterprise workflows. That's why many AI projects stall after initial pilots; they pass demos, but they don't survive production.

What 'Domain-Specific AI' Means in Practice

The term "domain-specific model" gets used often, but it's rarely explained clearly because it doesn't refer to a single technique. Domain-specific AI is usually a combination of:

Models trained or fine-tuned on relevant data
Retrieval systems using structured enterprise knowledge
Workflow-aware orchestration layers that understand system behavior
The enforcement of business rules and compliance policies

The goal is to improve decisions and actions within the specific context of your business or organization. Going back to the Oracle Fusion example, that would mean understanding things like configuration schemas, dependencies across modules, business rules and semantics, and testing and release cycles.

Many implementations use a combination of fine-tuned models and structured retrieval. What ultimately matters most is whether the system consistently produces accurate outcomes on real-world tasks.

Why Do More Specialized Models Perform Better?

It might seem logical to think that larger models are inherently more capable. But enterprise environments consistently prove that thinking wrong. Smaller models trained with specialized intent often perform better for a few reasons:

Reduced ambiguity: Since these models operate within a narrower scope, they limit the range of possible interpretations. A model trained on Oracle Fusion Financials, for example, won't conflate a journal entry adjustment with a subledger accounting rule change. General-purpose models routinely miss these types of distinctions, often with consequences that inconveniently appear during period-end close.
Higher reliability: With fewer parameters and more focused training, hallucinations are much less likely. And this makes sense in practice: a domain-specific model is far less likely to recommend a configuration change that violates an Oracle Fusion business rule it wasn't trained to recognize, such as ledger balancing segment requirements, approval hierarchy constraints, or three-way match validation logic in procurement.
Faster response times: Lower latency boosts real-time workflows. In Oracle Fusion environments where quarterly patch validation or release regression testing must be completed within a tight window (often measured in hours), response speed directly determines whether automation is even viable.
Easier deployment: Specialized models are more compatible with private or on-prem environments, which is where sensitive data must be contained. Oracle Fusion instances routinely hold compensation records, supplier pricing agreements, and financial forecasts that organizations are legally and contractually obligated to keep within defined boundaries.

These systems are collections of specialized agents, each in charge of a specific function, which more accurately resembles effective real-world teams: multiple specialists coordinating across shared workflows and a common goal.

The Risk of Black-Box Decisions in Enterprise Systems

As AI systems become more autonomous, the risks change. Model accuracy is always a concern, but it's really about whether its behavior can be understood, controlled and audited.

Enterprise environments face constraints that most consumer applications don't, including financial compliance requirements, data governance policies, audit trails for system changes, and accountability for automated decisions.

When AI systems operate as black boxes, several risks emerge:

Over-reliance on automation: If a system makes an incorrect decision, that error can propagate across multiple dependent processes.
Lack of explainability: Teams need to understand why a system took a specific action, especially in regulated environments.
Governance gaps: Sensitive data must be handled carefully, including how it's used in training and inference.
Adoption hesitancy: Developers and operators are less likely to trust systems they can't inspect or control.

With this backdrop, responsible implementations tend to include the ability to log decisions and actions in detail and to explain layers that surface reasoning paths. Role-based approval mechanisms are also critical, as are human review thresholds based on risk.

According to IDC, most critical AI-driven decisions require some form of human oversight. That may slow things down, but it also significantly improves accountability.

What Does Responsible Adoption Look Like in Practice?

A common mistake that many teams make is trying to broadly deploy AI from day one. That approach rarely works.

Organizations tend to have more success when they take an incremental strategy where they can clearly measure outcomes. Every organization varies, of course, but the first year of adoption often follows a pattern like this:

Quarter 1: Start with pilot use cases. Create key governance policies, identify the metrics that determine success, and deploy across smaller low-risk scenarios, such as integration validation or configuration checks.

Quarter 2: Expand controlled automation. Introduce capabilities like automated regression testing, patch validation and workflow guidance.

Quarter 3: Add system awareness. Enable agents to analyze dependencies, detect process inefficiencies, and assess the impact of changes across modules.

Quarter 4: Scale with control. Start to extend automation more broadly, focusing on measurable improvements in efficiency and reliability.

The main idea: Start narrow, measure your outcomes, and expand based on your evidence.

The Five Non-Negotiables for Enterprise-Ready AI Systems

Based on experience, we've uncovered a consistent set of requirements for systems that operate successfully in production. They are:

Domain alignment. The system must understand the business context it operates in.
Proven accuracy on real tasks. Performance must be measured against actual workflows, not generic benchmarks.
Interoperability across systems. Enterprise environments often span platforms like Oracle, SAP, Workday and others, and AI systems must operate across them.
Human oversight by design. Higher-risk actions should require review and approval.
Built-in governance and transparency. Logging, explainability and compliance controls should be part of the architecture.

As Agentic AI continues to evolve, its role in enterprise systems will grow as well. But success won't come from adopting the latest model or framework. It will come from treating AI as part of the system architecture, meaning it's subject to the same expectations for reliability, observability and control.

For developers, that means building AI that can operate safely inside complex, interdependent systems where every action has consequences. Those who succeed will be the ones who understand where those models break and design around them accordingly.

This content is made possible by a guest author, or sponsor; it is not written by and does not necessarily reflect the views of App Developer Magazine's editorial staff.