FDA SaMD Compliance for Adaptive Agentic Systems

Category III/IV

Likely SaMD classification for clinical agentic systems

Agentic systems that participate in diagnosis, treatment recommendation, or care delivery typically classify as Category III or IV SaMD under the IMDRF framework. The 2024 final guidance on Predetermined Change Control Plans for AI-enabled device software provides the regulatory opening for adaptive systems, but only if the change surface is bounded at design time. This article explains why agentic systems trigger SaMD classification, what 21 CFR Part 11 requires of agent audit trails, how IEC 62304 software lifecycle obligations translate to multi-agent verification testing, what a Predetermined Change Control Plan can and cannot cover for agentic systems, and how ISO 14971 risk management handles agentic-specific hazards including hallucinated tool calls, multi-agent feedback loops, and planning failures under uncommon inputs.

The FDA's Software as a Medical Device classification, finalized through the IMDRF framework and operationalized in the US under 21 CFR 807 and FDA guidance, defines four categories of SaMD based on the significance of information provided to the healthcare decision and the state of the healthcare situation. Agentic systems that participate in diagnosis, treatment recommendation, or care delivery are most likely to land in Category III or IV. The compliance bar is substantially higher than for software that supports human decisions without participating in them.

Agentic systems pose a specific compliance problem the SaMD framework was not originally designed for: the software changes its behavior in ways that the manufacturer did not pre-specify, in response to inputs from the operating environment. This is the central tension between agentic autonomy and SaMD's deterministic-software heritage. The FDA's 2024 final guidance on Predetermined Change Control Plans provides the architectural opening for resolving it.

Why Agentic Systems Trigger SaMD Classification

SaMD is software intended to be used for medical purposes that performs those purposes without being part of a hardware medical device. An agentic system that reads a patient's clinical history and recommends a treatment is performing a medical purpose. An agentic system that triages incoming patient messages and routes them to clinical staff based on inferred urgency is performing a medical purpose. An agentic system that interacts with patients to gather symptom information is performing a medical purpose, even if a clinician reviews the output before any action is taken.

The category determination is based on the IMDRF risk framework. Category I provides information to inform clinical management of non-serious conditions. Category IV drives clinical management of critical situations. Agentic systems that participate in routine workflows for stable patients may land in Category II. Agentic systems involved in oncology decisions, intensive care monitoring, or sepsis recognition land in Category IV with the corresponding pre-market submission expectations.

21 CFR Part 11 for Agentic Workflows

21 CFR Part 11 governs electronic records and electronic signatures in FDA-regulated environments. For agentic systems, the architectural consequences are immediate. Every record that documents an agent's contribution to a clinical decision is an electronic record subject to Part 11 controls: it must be attributable, legible, contemporaneous, original, and accurate (the ALCOA principles), with audit trails that capture creation, modification, and deletion of records with operator identity and timestamp.

Agent outputs are records. Agent intermediate states that influenced an output are records. The prompts the system used to produce the output are records, because the prompts are part of the decision-making process the FDA expects to be able to reconstruct. Part 11 audit trails for agentic systems are therefore substantially larger than for traditional clinical software: each workflow run produces a Part 11-compliant electronic record of every prompt, model response, tool call, and supervisor intervention.

IEC 62304 Software Lifecycle for Agentic Systems

IEC 62304 specifies the software lifecycle process for medical device software. Class A is software for which failure cannot lead to injury. Class B is software for which failure can lead to non-serious injury. Class C is software for which failure can lead to serious injury or death. Agentic systems in Category III or IV SaMD almost always classify as IEC 62304 Class B or C.

The standard requires risk management integrated into the software lifecycle, software requirements specifications tied to risk analysis, software architecture documentation, detailed design for Class B and C software, and software unit verification and integration testing. For agentic systems, the architecture documentation must describe agent decomposition, supervisor design, tool registry, and the planning algorithm in sufficient detail that the regulator can evaluate whether the design controls risk.

Verification testing for agentic systems requires both deterministic test cases (does the supervisor block this disallowed tool call?) and probabilistic evaluation (does the agent produce acceptable outputs across a representative input distribution?). The probabilistic evaluation is the harder requirement because the test design must account for the non-deterministic nature of LLM outputs without accepting unbounded variance.

The Predetermined Change Control Plan Opening

The FDA's December 2024 final guidance on Predetermined Change Control Plans (PCCPs) for AI-enabled device software provides the framework that makes adaptive agentic systems regulatorily tractable. A PCCP specifies in advance the modifications a manufacturer intends to make to a device after authorization, the modification protocol for making them safely, and the impact assessment showing that modifications will not introduce unacceptable risk.

For an agentic system, the PCCP must answer: which components can change without a new submission? Acceptable answers might include the policy contract, the tool registry, the supervisor thresholds. Unlikely-to-be-acceptable answers include the agent decomposition or the planning algorithm itself. The architectural commitment is that the changeable surface and the fixed surface are clearly separated at design time, and that changes to the changeable surface go through the PCCP's modification protocol.

The Bounded Change Surface

For an agentic SaMD to be deployed under a PCCP, the change surface must be bounded. Unbounded planning autonomy (the agent can call any tool, in any order, with any prompt) is incompatible with a regulator-approvable PCCP. The architecture must constrain the planning space to a documented set of plans and a documented set of tools, with changes to those sets going through the modification protocol.

This sounds restrictive. In practice, it forces the engineering team to write down the agentic system's actual behavior surface, which is something most teams have not done. The discipline of writing it down for the FDA submission also produces internal benefits: a system whose behavior surface is documented is a system whose engineers can reason about it, evaluate it, and improve it.

Continuous Performance Monitoring

Even with a PCCP, agentic SaMD requires continuous performance monitoring. The FDA's good machine learning practice guiding principles emphasize ongoing monitoring of deployed AI systems against the acceptance criteria established at submission. For agentic systems, the monitoring must track both performance metrics (does the agent still make correct decisions at the expected rate?) and behavioral metrics (does the agent still follow the planning patterns the PCCP described?).

Behavioral drift is a leading indicator of performance drift. An agent that begins calling tools in unusual sequences, or whose planning depth changes over time, may still produce acceptable outputs but is no longer the system that was submitted. The monitoring architecture must detect this and either route changes through the PCCP modification protocol or roll the system back to a known-good version.

Risk Management Per ISO 14971

ISO 14971 is the foundational risk management standard for medical devices. For agentic systems, the risk analysis must identify hazards specific to agentic behavior: hallucinated tool calls that could affect patient data, multi-agent feedback loops that could amplify errors, planning failures that could produce unsafe recommendations under uncommon inputs. These hazards must be evaluated for severity and probability, controlled through risk mitigation measures, and verified through testing.

The risk mitigations available are the same architectural patterns that satisfy Article 14: the supervisor, the reversibility tiers, the intervention points. The ISO 14971 risk file documents that these mitigations exist, why they are sufficient, and how they will be monitored post-market. The FDA will read this file.

Where This Lands

Agentic systems can comply with SaMD requirements. The path is narrow. It requires architectural commitments before the first submission, a bounded change surface documented in a PCCP, continuous behavioral monitoring, and a risk file that takes agentic-specific hazards seriously. Manufacturers that approach SaMD compliance as a paperwork exercise after building the system find that the system's actual behavior cannot be documented in a way the FDA will accept, because the behavior was never bounded.

The honest engineering question to ask before starting an agentic SaMD project is whether the team is prepared to build a system whose behavior is documented before deployment. If the answer is no, the system should not be deployed in a regulated clinical context. If the answer is yes, SaMD compliance is a tractable engineering problem with known patterns.

Healthcare Technology

The engineering behind this article is available as a service.

We have done this work — not advised on it, not reviewed documentation about it. If the problem in this article is your problem, the first call is with a senior engineer who has solved it.

Talk to an Engineer See Case Studies →

Related Reading