Agentic AI · Regulated Industries

Agentic AI for healthcare, financial services, and government — compliance-native from the first commit.

Most "agentic AI" sold to regulated buyers in 2026 is a RAG pipeline in costume. We deliver multi-agent systems with deterministic guardrails, regulator-grade audit trails, and explicit mappings to the EU AI Act, NIST AI RMF, FDA SaMD, FCA model-risk guidance, and HIPAA. Fixed-price. Source code and runbooks transferred on delivery day.

I. Definition

What agentic AI actually means in 2026

An agentic AI system is one that pursues a goal autonomously by planning a sequence of actions, executing those actions through tools and APIs, maintaining state across steps, and adapting its plan based on intermediate outcomes. The architectural distinction is that the agent decides what to do next — not the user, not the prompt template, not a static workflow definition.

The category is converging on five elements: autonomy within explicit constraints, planning that decomposes a goal into ordered subtasks, tool use via callable APIs and integrations, persistent state across the workflow, and governed execution that enforces policy, produces audit evidence, and escalates to humans at defined thresholds. A system missing any one of these is not agentic — it is something else marketed as agentic.

For a buyer in healthcare, financial services, government, or energy, the practical question is not whether a system is "agentic enough" — it is whether the system produces outcomes the organization can defend under audit, regulator inspection, and incident review. The architectural choices that determine this defensibility — supervisory layer design, tool-call validation, audit-trace completeness, human-in-the-loop placement — are precisely the choices most vendors gloss over with the word "guardrails."

II. The Regulated Gap

Why agentic AI in regulated industries is a different problem

In a startup deploying a customer-support agent, a hallucinated response is an embarrassing screenshot. In a hospital deploying a clinical documentation agent, a hallucinated medication dose is a sentinel event. In a bank deploying a KYC agent, a hallucinated identity verification is a Bank Secrecy Act violation. The asymmetry between consumer agentic AI and regulated agentic AI is not severity of failure — it is the standard of evidence the organization must produce when a failure occurs.

Regulated industries operate within a documentation regime that pre-existed AI. HIPAA's Security Rule expects technical safeguards documented at the system level. The OCC and FCA expect model risk management with versioned model inventories. The FDA expects design history files. The EU AI Act, as it phases in through 2026, expects technical documentation under Article 11 covering data, training, performance, and human oversight. None of these regimes were written with agentic systems in mind — but each regulator has been clear that AI-driven decisions are subject to the same documentation discipline as the human decisions they replace.

The buyer-side consequence is concrete. A Chief AI Officer or Chief Risk Officer evaluating an agentic AI partner is not asking "is your demo impressive?" They are asking: "When an auditor opens a notebook two years from now and reads the trace of a specific agent decision that affected a specific patient or customer, will the record hold up?" Most vendors cannot answer this question because the question was never part of their architecture.

This gap is the basis of our positioning. We build agentic systems that are designed first as auditable evidence-producing systems, and second as autonomous workflow systems. The order matters.

III. Reference Architecture

The five layers of a regulator-grade agentic system

Our reference architecture for agentic AI in regulated industries is a five-layer stack. Each layer has a defined compliance responsibility, an accountable owner inside the engineering team, and a specific evidence artifact it produces for audit.

1 · Intent

Goal and Policy Layer

The entry point. Every goal arriving from a user, scheduled job, or upstream system is matched against a policy contract that defines what the agent is allowed to attempt, against what data, with what spending and time limits. Goals that fall outside the contract are rejected before any model call. The artifact: the policy contract version applied to every workflow run.

2 · Planning

Plan and Decomposition Layer

The planner reads the goal and the policy contract, decomposes the goal into an ordered plan of tool calls, and emits the plan as a structured artifact before execution begins. Plans that propose disallowed tool sequences, exceed risk thresholds, or violate data residency are blocked here. The artifact: a versioned plan log, including the model and prompt template that produced it.

3 · Action

Tool Execution Layer

Every tool call is sandboxed and audited. Tool calls touching PHI, PCI, CUI, or other regulated data classes are routed through purpose-built adapters that enforce data minimization, redaction, and jurisdiction-aware processing. Failed tool calls, hallucinated tool names, and malformed structured outputs are caught at this layer rather than allowed to corrupt state. The artifact: the full tool invocation trace with inputs, outputs, latencies, and provenance.

4 · State

Memory and Context Layer

Agent state is segregated by sensitivity classification. PHI-bearing memory uses TTL-bounded, key-rotated stores. Cross-session memory enforces re-authentication for high-risk operations. Long-running workflows use durable state with explicit checkpoint boundaries so a failed step can be resumed without rerunning regulated-data operations. The artifact: state checkpoints with classification tags and access logs.

5 · Supervision

ALICE — Supervisory Layer

ALICE is our compliance enforcement agent, deployed as a supervisor above the application agents. ALICE evaluates every plan, every tool call, and every state transition against policy-as-code; fires on threshold breaches (cost, error rate, drift, jurisdictional violations); routes high-risk actions to a documented human-in-the-loop checkpoint; and produces the cryptographically-timestamped audit packet a regulator or internal audit team can consume directly. ALICE is how an agentic system satisfies the requirement that every agent maps to an accountable human policy. The artifact: ALICE's audit packet, generated continuously and exportable on demand.

The architecture is platform-neutral. We have deployed it on Claude, GPT-4 class, Gemini, and on-premises open-weights models. We have run it with LangGraph, with custom state machines, and with Vercel AI SDK. The architectural commitments are model-agnostic and orchestration-agnostic; the compliance commitments are not.

IV. Regulatory Mapping

Mapping agentic systems to the regulatory stack

EU AI Act

Agentic systems frequently classify as High-Risk under Annex III when they autonomously affect credit scoring, employment decisions, medical decisions, critical infrastructure operation, or law enforcement. High-Risk classification triggers the full obligation set under Articles 9 through 15. The three articles that have the deepest architectural consequences for agentic systems are Article 11 (technical documentation), Article 13 (transparency), and Article 14 (human oversight).

Article 11 requires technical documentation describing the system\'s intended purpose, the data used, the development process, the performance characteristics, and the post-market monitoring. For agentic systems, this documentation must extend to the planning logic — the prompts, models, and decomposition strategies the agent uses — because these are not implementation details but determinants of system behavior. We treat agent prompts and planning logic as versioned, documented artifacts on the same footing as model weights.

Article 14 requires effective human oversight by natural persons. For agentic systems, this means designed-in human-in-the-loop checkpoints at decision boundaries — not "the user can stop the agent" but specific actions (treatment changes, fund transfers, automated denials) that the agent cannot complete without an accountable human approval. Our supervisory layer enforces these checkpoints as policy-as-code; the policy cannot be bypassed by changing a prompt.

NIST AI Risk Management Framework

The NIST AI RMF organizes around four functions: Govern, Map, Measure, Manage. For agentic systems, each function requires extensions beyond what suffices for single-shot ML models. Map must include the full agent inventory — every distinct agent class operating in the system, with its policy contract, tool surface, and accountable owner. Measure must include continuous evaluation under realistic operating conditions, not point-in-time validation; agentic systems exhibit emergent behavior under interaction loads that benchmark suites do not capture.

Manage must include incident response procedures specifically tuned to agentic failure modes: hallucinated tool calls, multi-agent feedback loops, runaway planning depth, cross-agent objective conflict. Govern must include explicit accountability assignment — every agent class has a named human owner, the same way every production service has a named on-call. The NIST AI RMF Generative AI Profile (NIST AI 600-1) provides additional specific guidance for systems that produce outputs based on user prompts, much of which extends naturally to agentic systems.

FDA Software as a Medical Device (SaMD)

The FDA classifies software that meets the medical device definition under one of four SaMD categories based on the significance of information provided and the healthcare condition involved. Agentic systems that participate in diagnosis, treatment recommendation, or care delivery are most likely to land in Category III or IV. Compliance requires alignment with 21 CFR Part 11 for electronic records and electronic signatures, IEC 62304 for medical device software lifecycle, and ISO 14971 for risk management.

The FDA\'s Predetermined Change Control Plan framework, finalized in 2024, opens a path for AI/ML-enabled medical devices to update under specified Modifications Protocols without filing a new submission for every change. For agentic systems to fit this framework, the Modifications Protocol must explicitly bound the change surface: which prompts, models, tools, and planning logic can change, under what acceptance criteria, with what monitoring. Agentic systems with unbounded planning autonomy are unlikely to fit a Predetermined Change Control Plan without architectural redesign — which is why our reference architecture treats agent policy contracts as device-level configuration controlled through Quality System processes.

FCA and SR 11-7 Model Risk Management

UK FCA SS1/23 and the US Federal Reserve\'s SR 11-7 establish model risk management expectations that pre-date agentic systems but apply to them directly. Both regimes require a model inventory, documented validation, ongoing monitoring, and clear governance — and both treat any system used in regulated financial decisions as a model regardless of whether it includes a traditional statistical model.

Agentic systems used in credit decisioning, fraud detection, KYC, AML, or trading must therefore be treated as models for governance purposes. The complication is that an agentic system is not a single model — it is a composition of models, prompts, tools, and planning logic, any of which can change. We address this by maintaining an agent registry on the same footing as a model inventory, with version-controlled policy contracts, performance baselines, and challenger evaluation. Every agent change goes through the same model risk management workflow as a model change.

HIPAA

HIPAA\'s Privacy Rule, Security Rule, and Breach Notification Rule apply to any agentic system that handles Protected Health Information. The architectural consequences are minimum-necessary access enforced at the tool call layer, audit-log completeness at the Security Rule § 164.312 standard, encryption in transit and at rest including agent memory stores, and business associate agreements with every model provider whose API receives PHI. Tool calls that route PHI to a third-party API without a current BAA are a per-call HIPAA violation; our supervisory layer blocks them by design.

V. Multi-Agent Safety

Patterns for safety in multi-agent systems

Multi-agent systems exhibit failure modes that single-agent systems do not. When agents pass intermediate results between themselves, errors compound. When agents optimize objective functions, conflicts emerge. When agents share state, race conditions become consistency violations. The discourse on "guardrails" addresses single-agent failures; it does not address system-level multi-agent failures. The following patterns do.

Supervisory Agent Pattern

The supervisory agent does not participate in application work. It observes plan emissions, tool calls, and state transitions, evaluates them against policy, and intervenes when policy is at risk of violation. The supervisor has read access to everything; write access to only the intervention API. We use this pattern in every multi-agent deployment because it produces an enforceable separation between the agents that do the work and the policy that constrains the work — a separation that is structurally absent in single-agent designs.

Kill Switch Architecture

Every agentic workflow has a defined kill-switch contract: a list of conditions under which the workflow terminates immediately, the state it leaves the system in after termination, and the audit record it produces. The kill switch is fired by the supervisor, not by the application agents. Conditions include cost ceiling exceeded, plan-length ceiling exceeded, tool-failure rate exceeded, jurisdiction violation detected, and supervisor-detected drift from baseline. A kill switch that requires the failing agent to cooperate with its own termination is not a kill switch.

Cross-Agent Conflict Resolution

When multiple agents have authority over the same state or the same action, conflict is inevitable. We address this with explicit precedence rules embedded in the supervisory layer: which agent\'s decision wins in which scenarios, and how the loser is informed. Precedence rules are versioned policy artifacts, not implicit assumptions in code. The default precedence we apply in regulated deployments is that the agent with the narrower, more specific policy contract wins over the agent with the broader contract — meaning specialized regulated agents always take precedence over general-purpose agents in their domain.

Cost Throttling and Token Governance

Multi-agent loops can spiral into infinite reasoning sequences that exhaust budgets in minutes. Our cost-throttling layer enforces per-workflow ceilings, per-agent ceilings, and per-organization daily ceilings; the supervisor refuses new tool calls when ceilings approach. Cost is not an operational concern alone — for clients in regulated industries, an uncontrolled cost spike is a Section 5 SOX deficiency around operational controls.

Hallucinated Tool Call Recovery

Agents invoke tools that do not exist, with parameters that do not parse, against data that does not match the schema. Standard practice is to retry; that produces non-deterministic loops. We treat hallucinated tool calls as deterministic failure signals: the supervisor catches them at the action-layer boundary, logs them as a distinct event type, and routes the workflow through a recovery path that either re-plans with a constraint added or escalates to human review. Hallucinated tool calls are also a leading indicator of model drift — we track them as a fleet-wide metric.

Audit-Trail Completeness

The audit trail for a multi-agent workflow must reconstruct, for a specific outcome at a specific time, the full chain of decisions, tool calls, intermediate states, and supervisor interventions that led to that outcome. We generate audit packets continuously; the regulator-facing format includes the goal that triggered the workflow, the policy contract version in force, the plan version, every tool invocation with provenance, every supervisor intervention, every state checkpoint, and the cryptographically-bound timestamp chain. Audit packets are designed to be consumed directly by internal audit and external regulators without translation.

VI. PHI and PII for Agents

Sensitive data patterns for agentic workflows

Agentic systems handle PHI and PII through a longer path than traditional applications: data enters as part of a goal, gets embedded in planning prompts, gets passed between tool calls, gets stored in memory across steps, gets included in audit logs, and sometimes ends up in model provider logs. Every one of those touchpoints is a potential disclosure event. The patterns below describe how we close them.

Pre-execution classification. Every input entering an agentic workflow is classified for sensitivity before the planner runs. PHI, PCI, CUI, and other regulated classes are tagged at the field level. Downstream tools and prompts receive the data with classification metadata attached, so PHI is never treated as ordinary string content.

Tool-call sandboxing. Tool adapters enforce data-class-specific routing. PHI never reaches a model provider API without a BAA. PCI never reaches a non-PCI-DSS environment. CUI never crosses jurisdictional boundaries that would violate ITAR or export controls. The sandbox is enforced by the adapter layer, not by adapter-user discipline.

Output validation. Tool outputs are validated against the classification of their inputs. A tool that received PHI cannot return non-PHI tagged data that contains PHI; the validator catches this at the boundary. Output validation is one of the most overlooked surfaces in agentic systems and one of the most common sources of compliance findings post-deployment.

Cross-border routing. Jurisdictional rules — GDPR data residency, UAE PDPL onshoring, China PIPL, India DPDPA — are enforced at the tool layer. Workflows that touch data subject to multiple jurisdictions are routed through region-pinned model endpoints and region-pinned storage. The supervisor blocks tool calls that would route data outside permitted jurisdictions.

Memory hygiene. PHI-bearing memory uses TTL-bounded stores with key rotation. Cross-session memory enforces re-authentication for high-risk operations. Long-running workflows checkpoint state explicitly rather than allowing implicit accumulation. When a session ends or a workflow completes, regulated-class memory is cryptographically zeroized, not merely garbage-collected.

VII. Our Production Stack

The platforms behind our agentic systems

Supervisory Agent

ALICE

Compliance enforcement deployed as the supervisor above application agents. Validates plans, tool calls, and state transitions against policy-as-code; produces the regulator-grade audit packet.

AI Digital Labor

Claire

Our production agentic platform — multi-agent workflows for regulated industries. Healthcare claims triage, KYC investigation, member service automation. Reference deployment of the architecture on this page.

Regulatory Intelligence

Regure

Real-time regulatory change detection across 30+ frameworks and 47 jurisdictions. Feeds the policy layer so agent policy contracts update when regulators move.

Self-Healing Ops

SentienGuard

Autonomous monitoring and remediation for the infrastructure agentic systems run on. Detects, diagnoses, and resolves operational anomalies without paging on-call.

VIII. Anti-Hype

What we will not call agentic AI

The market has discovered that "agentic" sells. The result is a wave of products that adopt the language of agency without the substance. For regulated buyers evaluating partners, the following are reliable disqualification signals.

Not agentic

A RAG pipeline in a costume

Retrieval-augmented generation produces grounded responses. It does not plan, does not maintain state across a workflow, does not call tools beyond the retriever, and does not adapt based on outcomes. Marketing it as "an agent" is the dominant form of agentwashing in 2026.

Not agentic

A chatbot with a fancy UI

A conversational surface where every turn is independent is not agentic. It is a chatbot. The lack of persistent state across the user goal is the giveaway.

Not agentic

A single tool call wrapped in an LLM

Asking an LLM to format an API call, executing it, and returning the result is a workflow with one step. It is not an agent. Real agency requires planning across multiple steps with branching based on intermediate outcomes.

Not agentic

Tool-calling without planning

A model that calls tools in a fixed sequence defined by a developer is executing a workflow, not planning one. The model is the executor, not the planner. This pattern is useful but is not what regulators are evaluating when they reference "autonomous AI."

Disqualifying

No documented kill-switch contract

If a vendor cannot articulate the specific conditions under which the agent terminates, the state it leaves behind, and the audit record produced, the system is not deployment-ready for regulated use.

Disqualifying

Model lock-in masquerading as architecture

A vendor proposal that requires a specific model provider, with no model-agnostic abstraction layer, is not an agentic AI system — it is a managed services dependency on someone else's API. Regulated buyers need model-agnostic routing and the ability to switch providers without rewriting orchestration.

Disqualifying

No story for non-deterministic errors

Hallucinated tool calls, malformed JSON outputs, and unexpected schema variations are inevitable. A vendor without a designed recovery path for these conditions is delivering a prototype, not a production system.

IX. Engagement

How we deliver agentic AI engagements

We deliver agentic AI engagements on the same fixed-price, fixed-scope basis as the rest of our work. There is no discovery phase billed separately. The first deliverable on day one is a working system component, not a slide deck.

Tier I — Surgical Strike. 10 to 30 engineers, 8 to 16 weeks. One high-value use case taken from architecture to production. Examples include HIPAA-compliant clinical documentation triage, AML investigation automation, SR 11-7-aligned credit decisioning, and FedRAMP-scoped government workflow automation. The engagement closes with full source code and IP transfer, plus runbooks for your team to operate the system.

Tier II — Enterprise Program. 40 to 100 engineers, 3 to 9 months. A multi-workstream agentic transformation program. Multiple use cases instrumented under a shared supervisory architecture, integrated with the organization\'s existing model risk management and operational risk frameworks. Co-deployment with internal teams from week one.

Tier III — Total Infrastructure. 100 to 250+ engineers, 6 to 18 months. Enterprise-wide agentic AI platform build, including the supervisory infrastructure, agent registry, model risk integration, and the governance functions that surround it. Reserved for organizations where agentic AI is a strategic capability rather than a tactical deployment.

Every engagement includes documentation packets sized for the relevant regulator: HIPAA evidence packs for healthcare, model risk management documentation for financial services, technical documentation under EU AI Act Article 11 for European deployments, and FedRAMP-ready System Security Plan extensions for government work.

X. Questions

Frequently asked questions

What is agentic AI in 2026?

Agentic AI describes systems that pursue goals autonomously by planning multi-step workflows, calling tools and APIs, maintaining state across steps, and adapting based on outcomes — under explicit governance and human oversight. It is distinct from chat-based GenAI assistants and from single-shot LLM calls. Agentic AI moves the locus of work from prompts to workflows, from model-centric to architecture-centric, and from experiments to production economics. In regulated industries, an agentic system is only credible if it also produces a regulator-grade audit trail, enforces policy-as-code, and routes high-impact actions through deterministic human-in-the-loop checkpoints.

How is agentic AI different from RAG or a chatbot?

RAG (retrieval-augmented generation) is a technique for grounding LLM responses in retrieved context. A chatbot is a conversational surface. Neither plans, neither executes actions across systems, and neither maintains accountable state across a multi-step workflow. An agentic system uses RAG as one tool among many, plans how to solve a goal, calls tools (which may include RAG retrievers, database writers, payment APIs, EHR integrations), validates intermediate results, and either completes the workflow or escalates to a human at a defined threshold. The most common form of agentwashing in 2026 is repackaging a RAG pipeline as an autonomous agent.

How do we map agentic systems to the EU AI Act?

Agentic systems frequently classify as High-Risk AI under Annex III of the EU AI Act because they autonomously affect credit scoring, recruitment, medical decision support, critical infrastructure, or law enforcement. High-Risk classification triggers Articles 9–15: risk management system, data governance, technical documentation, record-keeping, transparency, human oversight, accuracy and robustness, and cybersecurity. The architecturally important consequence is that agent autonomy must be bounded by a designed human oversight regime per Article 14, and the technical documentation under Article 11 must describe agent planning logic, tool calls, intermediate outputs, and the validation criteria that trigger human review.

How does NIST AI RMF apply to multi-agent systems?

The NIST AI Risk Management Framework organizes around four functions: Govern, Map, Measure, Manage. For agentic systems, Map must include the agent decomposition graph and tool inventory. Measure must include continuous evaluation of agent behavior under realistic conditions, not point-in-time testing. Manage must include incident response procedures specifically for non-deterministic failures (hallucinated tool calls, multi-agent conflict, runaway loops). Govern must include accountable owners for every agent class and a model-and-agent registry mirroring the organization's model risk management policy.

Can agentic AI comply with FDA SaMD?

It can, but the path is narrower than for deterministic software. The FDA's Predetermined Change Control Plan framework (final guidance 2024) creates a pathway for AI/ML-enabled medical devices to update under specified Modifications Protocols. For agentic systems, the Modifications Protocol must explicitly bound the change surface — which prompts, models, tools, and planning logic can change without re-submission — and must define performance acceptance criteria the system must meet post-update. Agentic systems whose planning logic adapts unboundedly are unlikely to fit a Predetermined Change Control Plan without architectural redesign.

How does ALICE govern agentic systems?

ALICE is our compliance enforcement agent. In agentic deployments, ALICE operates as a supervisory layer above the application agents: it validates every tool call against policy, enforces PHI and PCI redaction at the data boundary, logs the agent's planning trace with cryptographic timestamps, fires on threshold breaches (cost, error rate, drift), and produces the audit packet a regulator or internal audit team can consume. ALICE is the mechanism by which our agentic systems satisfy "every agent maps to an accountable human" — the policy ALICE enforces is the human owner's policy, expressed as code.

What does a Tier I agentic AI engagement look like?

A Tier I (Surgical Strike) agentic AI engagement is 10–30 engineers over 8–16 weeks, fixed-price, with full source code and IP transfer at close. Typical scope: design and deploy a multi-agent workflow for a single high-value use case (e.g., HIPAA-compliant clinical documentation triage, KYC/AML investigation, claims appeals adjudication), instrumented with our supervisory architecture, mapped to the relevant regulatory framework, and handed over with the runbooks your team needs to operate it. The first deliverable is a working system component, not a discovery document.

→

Ready to build an agentic system that holds up to audit?

Senior engineer on the first call. Fixed-price proposal within 5 business days. Source code and runbooks transferred on delivery day.

Talk to an Engineer

Continue Reading