News

Industry Insights

Guidance for AI-Enabled Medical Devices: Key Insights and Takeaways for 510 (k) Study Design and Data Validation & Approved 510k Example

May 2026 | CAHIR Solutions

Overview

The FDA's draft guidance on AI-enabled device software functions gives manufacturers a far more explicit roadmap for what should appear in a premarket submission, spanning device description, user interface and labeling, risk assessment, data management, model description and development, validation, and post market performance monitoring. In parallel, the FDA, Health Canada, and the MHRA have articulated transparency principles for machine learning-enabled medical devices that reinforce the need to communicate intended use, development, performance, logic when available, limitations, and lifecycle updates to relevant audiences. Together, these materials signal a maturing regulatory framework that expects AI-enabled medical devices to be validated not only for technical performance, but also for safe use, understandable outputs, and equitable real-world performance across clinically relevant populations.

For major stakeholders, the practical implications are clear. Manufacturers must design validation programs that reflect actual clinical use and subgroup performance, regulatory teams must organize submissions around robust data and risk documentation, healthcare organizations must assess transparency and workflow fit before implementation, and clinicians need enough context to understand what the model does well, where it may be limited, and how it should inform rather than replace medical judgment.

Why this guidance matters now

FDA explains that, for AI-enabled devices, the model is part of the mechanism of action, which makes data management, model development, validation, and ongoing performance oversight central to demonstrating safety and effectiveness. The guidance also states that submission content should capture not only what has already been done to develop and validate the device, but also what the sponsor plans to do in the future to ensure ongoing safety and effectiveness across the total product lifecycle.

That lifecycle emphasis is especially important because AI-enabled devices can present risks that differ from conventional software functions, including risks tied to misunderstood information, unclear limitations, unrepresentative datasets, shifting performance after deployment, and bias arising from spurious correlations or inadequate subgroup representation. At the same time, the January 2026 Clinical Decision Support Software guidance clarifies that software functions analyzing medical images, image-derived findings, or signals generally remain device functions subject to FDA oversight, while certain non-device CDS functions are excluded only if they satisfy all statutory criteria in section 520(o)(1)(E) of the FD&C Act. That distinction matters because many AI-enabled imaging tools marketed as “decision support” still fall squarely within device regulation when they analyze images or other signal-level inputs.

Validation scope: what FDA expects

FDA's latest AI-enabled device guidance makes clear that validation is not a narrow exercise limited to a single overall performance number. Instead, FDA expects validation to support a determination that the device is safe and effective for its intended use, intended users, intended workflow, and intended use population, including understanding performance in clinically relevant subgroups and under realistic conditions of use.

Validation must align with intended use

The guidance frequently ties validation expectations to intended use. Sponsors are expected to describe the intended users, the use environment, the device workflow, the degree of automation, the model inputs and outputs, and the way outputs are intended to be used in clinical care, because each of these elements informs what kind of validation evidence is necessary. If a device is meant to support interpretation of medical images, then the validation program should reflect the kinds of images, acquisition protocols, compatible systems, use settings, and user interactions that define that intended use.

This principle is reinforced by the CDS guidance, which distinguishes software that analyzes medical images from software that merely analyzes medical information. If an AI-enabled product acquires, processes, or analyzes medical images, it fails Criterion 1 for non-device CDS and remains a device subject to FDA oversight. For AI imaging sponsors, that means validation cannot be minimized by framing the product as a generic support tool; the regulatory evidentiary burden follows the function actually performed.

Validation should go beyond aggregate performance

FDA specifically recommends that device performance be described across important subgroups, generally including patient characteristics such as sex, age, race, ethnicity, disease severity, geographic sites, and data collection equipment. This is a critical point because acceptable aggregate performance can obscure materially worse performance in underrepresented or clinically distinct populations. The transparency principles echo this by encouraging communication of performance, confidence intervals, data characterization gaps, and known populations that may be at risk of bias because they were not well represented in training or clinical datasets.

From a regulatory strategy perspective, this means sponsors should not treat subgroup analysis as a secondary appendix exercise. It should be built into study design, pre-specification, and sample planning early enough that resulting analyses are meaningful, interpretable, and, where claims are intended, statistically supportable.

Human factors and information-related risks matter

FDA also links validation to user understanding and risk control. The user interface and labeling are treated as part of the overall user interface system, and FDA notes that user interface design can be essential to controlling risks associated with not knowing or misunderstanding information that is critical to safe and effective use. In the risk assessment section, the agency emphasizes that misunderstood, misused, or unavailable information can itself create safety risks for AI-enabled devices, particularly when device logic, subgroup performance, limitations, or failure modes are not obvious to users.

For stakeholders, this broadens the concept of validation. A technically strong model may still face regulatory difficulty if the sponsor cannot show that users will receive the right information at the right time, understand what the device output means, and use the output appropriately within the clinical workflow.

Design considerations for AI-enabled medical devices

FDA's guidance frames design as more than model architecture. The submission should help FDA understand the general characteristics of the AI-enabled device, including whether AI is used, how AI is used to achieve the intended use, the device's operational sequence, installation and maintenance procedures, configurable elements, and how the device fits into clinical decision-making.

Device description must be operational, not superficial

The guidance recommends including a statement that AI is used in the device, a description of device inputs and outputs, the intended users and their qualifications, the intended use environment, and the intended workflow, including the device's degree of automation compared with the standard of care. This means that the device description should not read like a marketing overview. It should read like an operational map of how the tool works, who interacts with it, when it is used, what it consumes as input, what it produces as output, and what role it plays in clinical judgment.

That expectation also supports more effective internal development decisions. When teams explicitly describe configurable elements such as alert thresholds, visualizations, operating points, and customizable settings, they are forced to identify who can change those settings, how users know what configuration is active, and what impact those choices may have on decision-making. For AI-enabled devices, those design decisions can materially affect performance, risk, and labeling obligations.

User interface design is part of safety and effectiveness

FDA states that the user interface includes all points of interaction between the user and the device, including display elements, alarms, training, packaging, labeling, and software screens. Sponsors are encouraged to provide graphical depictions, written descriptions, operational sequences, example reports, and even recorded demonstrations to show how the device workflow operates and how information is presented to users.

This reflects a broader regulatory shift toward human-centered transparency. The international transparency principles recommend optimizing the software user interface so information is responsive to user needs and delivered through appropriate modalities such as on-screen text, alerts, diagrams, audio, video, and safeguards. The message for developers is straightforward: safety is not established by algorithm performance alone; it also depends on whether the interface helps the human-AI team function effectively.

Data verification and data management

Among all sections of the AI-enabled device guidance, the data management discussion may be the most consequential for AI developers. FDA states that because the model is part of the mechanism of action, clear explanation of how data were collected, processed, annotated, stored, controlled, and used is critical to the agency's understanding of how the device was developed and validated.

FDA wants traceable, characterized datasets

The guidance highlights the importance of documenting development and validation data sources, study sites, sample size, demographic distributions, and the criteria or expertise used to determine the clinical reference standard or ground truth. It also emphasizes that validation datasets should include study design details such as randomization schemes, repeated measurements, clinical reference standards, primary endpoints, and pre-specified performance criteria.

This expectation reflects a regulatory need for traceability. FDA is not simply asking whether a model works; it is asking whether the sponsor can explain why reviewers should trust the data pipeline that produced the claimed result. In practice, that means sponsors need strong documentation around inclusion criteria, annotation methods, dataset governance, separation of training and testing sources, and the clinical basis for truthing decisions.

Independence between development and validation data is essential

FDA's guidance makes a strong point about test data independence, noting that validation data should be independent from training data and that data leakage creates uncertainty about the true performance of the device. The MammoScreen 4 510(k) summary illustrates this principle clearly: Therapixel reported that training and tuning sources were separated from the test group, that sources in the test group were entirely left out during training and tuning, and that standalone performance testing used only data from the independent test group.

For regulatory teams, this is a useful real-world benchmark. It shows how a sponsor can explicitly document source separation and external validation on unseen data in language that aligns well with FDA's broader guidance expectations.

Data quality is also a bias-control strategy

FDA explicitly states that data management is an important means of identifying and mitigating bias. The guidance describes AI bias as a tendency to produce incorrect results in a systematic, sometimes unforeseeable way due to limitations in training data or erroneous assumptions in the machine learning process. Because performance and behavior depend heavily on the quality, diversity, and quantity of the underlying data, weak data characterization creates both scientific and regulatory risk.

The transparency principles add that known gaps in data characterization, underrepresented patient populations, and use cases where actual device input will not align with development or validation data are all important to communicate. This means data verification is not only an internal quality exercise; it is also part of the external transparency package regulators and users increasingly expect.

Transparency: no longer optional extras

Transparency has emerged as one of the defining regulatory expectations for ML-enabled devices. In the 2024 joint FDA-Health Canada-MHRA transparency principles, transparency is defined as the degree to which appropriate information about an ML-enabled device, including intended use, development, performance, and logic when available, is clearly communicated to relevant audiences.

Transparency must be audience-specific

The joint principles identify multiple relevant audiences for transparency, including healthcare professionals, patients, caregivers, administrators, support staff, payors, and governing bodies. FDA's AI-enabled device guidance similarly stresses that intended users may include those who interpret the output, those who install or maintain the device, and those who decide how the device will be deployed in practice. For sponsors, the implication is that one generic block of disclosure is not enough. Different users need different kinds of information, in different places, at different times, to support safe and effective use.

Labeling should explain more than intended use

FDA recommends that labeling for AI-enabled devices address how AI is used, model inputs, model outputs, degree of automation, model architecture at a high level, development data, validation data, performance metrics, subgroup performance, monitoring tools, known limitations, installation instructions, customization options, and supplementary metrics or visualizations. This is a notably expansive vision of labeling. It recognizes that for AI-enabled devices, users may need to understand not just what the product does, but also what kinds of data it learned from, where it may not perform as well, and how outputs should be interpreted within the broader workflow.

The transparency principles similarly encourage communication of benefits and risks, summaries of clinical studies, model and dataset characteristics, performance monitoring updates, and known biases or failure modes. For healthcare providers and health systems evaluating procurement decisions, this kind of labeling can materially affect trust, adoption, training requirements, and implementation planning.

Human-centered transparency supports adoption and safety

The 2024 transparency principles note that effective transparency can help identify and evaluate risks and benefits, detect errors or declines in performance, promote health equity, support informed decisions, and increase fluency and confidence in the use of ML-enabled devices. This is a useful reminder that transparency is not simply a compliance burden. When done well, it improves usability, strengthens the human-AI partnership, and reduces the likelihood that safety-critical limitations will be missed in practice.

Bias mitigation: a core regulatory and commercial issue

Bias mitigation is not treated as a side topic in FDA's guidance. It is embedded in data management, validation design, labeling, and lifecycle monitoring because systematic error in underrepresented populations can undermine both safety and market credibility.

How FDA describes bias risk

FDA explains that AI models may overfit to spurious correlations, such as features unique to specific scanners, sites, or patient subpopulations that are not actually relevant to the disease or condition of interest. The guidance also notes examples of confounders, such as all diseased cases being associated with the same instrument or irrelevant visual markers appearing more often in positive cases. These patterns can create the illusion of strong performance during development while failing to generalize in real-world deployment.

The transparency principles connect this directly to health equity, stating that knowledge of how a device works and how it was developed can help identify bias and assess whether a system or output is justifiable. For major stakeholders, that means bias is not only a technical defect; it is also a governance, trust, and implementation issue.

Practical bias mitigation steps sponsors should take

The FDA materials collectively point toward a practical bias-mitigation framework.

Build development and validation datasets that reflect the intended use population as closely as possible.
Characterize demographic distributions and data provenance clearly.
Conduct subgroup analyses across patient characteristics, disease severity, geographic site, and equipment type.
Identify known gaps in dataset representation and disclose them clearly.
Monitor performance across the lifecycle so that degradation or emerging bias signals can be detected after deployment.

This is particularly important for companies seeking broad commercial uptake. Health systems and clinical leaders are increasingly sensitive to questions about whether AI-enabled tools were developed and tested on populations that resemble their own patient base. Poor answers to those questions can slow adoption even when a device clears FDA review.

What this means for major stakeholders

Manufacturers and developers

For manufacturers, the key message is that FDA expects a submission package built around evidence, not aspiration. Sponsors need a coherent chain from intended use to device design, from data management to model development, from validation to labeling, and from risk assessment to post market monitoring. Companies that wait until late in development to organize documentation on data provenance, subgroup performance, workflow integration, or bias risk will likely find gaps that are difficult and expensive to fix.

Manufacturers should also pay close attention to lifecycle management. FDA's draft guidance discusses future safety and effectiveness responsibilities across the total product lifecycle, and recent clearances like MammoScreen 4 show that Predetermined Change Control Plans are becoming a meaningful path for well-scoped future modifications.

Regulatory and quality leaders

For regulatory affairs and quality teams, the opportunity is to integrate AI-specific evidence expectations into design controls and submission planning from the outset. FDA notes that quality system documentation can serve as an important source of evidence for demonstrating how risks associated with AI-enabled devices are being addressed, even when FDA is not directly reviewing Quality System compliance as part of 510(k) review. That makes internal documentation practices strategically important, not merely operational.

These teams should also align AI programs with recognized standards and related FDA guidance, including software lifecycle, usability, risk management, and applicable CDS policies where relevant. The companies that perform best in review are likely to be those that translate AI development work into clear regulatory narratives rather than leaving it buried in technical notebooks or ad hoc validation decks.

Healthcare providers

For clinicians, the guidance supports a more informed and safer adoption environment. FDA is signaling that AI-enabled devices should come with clearer descriptions of what the model sees, what it outputs, how the outputs fit into care, and where the technology is limited. In the CDS context, FDA continues to emphasize the importance of preserving the healthcare professional's ability to independently review the basis for recommendations when a software function is meant to avoid device status under section 520(o)(1)(E).

Even when a device is fully regulated, this emphasis on human understanding remains important. Clinicians are best served when AI tools function as transparent aids that improve efficiency or detection without obscuring uncertainty, subgroup limitations, or the continuing need for professional judgment.

Health systems and implementation leaders

Hospitals and imaging centers should read the guidance as a reminder that procurement of AI-enabled tools is a governance decision, not only a technology purchase. The materials support asking detailed implementation questions: Which inputs and compatible devices were validated, what training is required, how are updates communicated, how should local workflows adapt, what subgroup performance evidence exists, and what post-deployment monitoring tools are available.

The transparency principles also note that detailed information may be needed at the point of device acquisition or implementation, not just during use. That supports a more rigorous pre-purchase review process for digital health committees, clinical leadership, and procurement teams.

Case Example for AI/ML SaMD 510K Submission: MammoScreen 4 (K243679)

We chose this example because it is well organized, follows FDA guidance closely, and clearly documents the design and validation evidence supporting substantial equivalence. Its structure makes it easy to follow and useful as a model for preparing a compliant 510(k) submission

MammoScreen 4 offers a strong real-world example of how FDA's current expectations around validation, transparency, data independence, and lifecycle management are being operationalized in a cleared AI-enabled imaging device.

What the device does

FDA cleared MammoScreen 4 on July 3, 2025, under 510(k) K243679 as a Class II radiological computer assisted detection and diagnosis software device under 21 CFR 892.2090, with two product codes QDQ and QIH. The device is indicated as a concurrent reading and reporting aid for physicians interpreting mammograms, including use with compatible full-field digital mammography and digital breast tomosynthesis systems, and it can also use compatible prior examinations in the analysis.

The output includes graphical marks of soft-tissue lesions or calcifications, lesion characterization as mass or asymmetry, distortion, or calcifications, and suspicion scores at the finding, breast, and mammogram level, along with location details such as quadrant, depth, and distance from the nipple. FDA-cleared labeling also makes an important use-limiting statement: patient management decisions should not be made solely based on MammoScreen 4's analysis.

Why the case is notable

The MammoScreen 4 clearance is notable for two reasons. First, it documents a conventional but robust AI validation package involving standalone performance testing and multi-reader multi-case clinical studies. Second, FDA's substantial equivalence determination explicitly included review and clearance of a Predetermined Change Control Plan, underscoring how lifecycle planning is becoming part of the regulatory conversation for AI-enabled devices.

Validation approach

According to the 510(k) summary, standalone testing was used to evaluate non-inferiority in cancer detection performance compared with a previous version. Reported primary endpoints included mammogram-level AUC of 0.894 compared with 0.867 for the reference version, breast-level AUC of 0.919 compared with 0.895, and finding-level AUC LROC of 0.891 compared with 0.837, with positive lower bounds on the 95% confidence intervals of the differences and p-values reported as less than 0.0001 for each endpoint.

The validation cohort included 1,475 patients and 2,950 studies, with subgroup considerations spanning density, lesion type, age, lesion size, lesion severity, race, ethnicity, data provenance, and reference standard for negative cases. This aligns well with FDA's broader recommendation that performance be explained across important subgroups rather than solely at the aggregate level.

Clinical study design

The summary also reports three multi-reader multi-case studies, one for full-field digital mammography, one for digital breast tomosynthesis, and one for combined tomosynthesis and 2D mammograms using prior examinations. The objective was to determine whether radiologist performance with MammoScreen was superior to unaided radiologist performance, using a cross-over design with MQSA-qualified and ACR-certified readers. This is a meaningful example of evaluating the human-AI team rather than only the algorithm in isolation.

Data verification strengths

One of the strongest elements of the submission is its description of data independence. Therapixel states that sources in the training and tuning group were used only for model training and tuning, while sources in the test group were used only for external validation on unseen data from sources entirely left out during training and tuning. FDA's AI-enabled device guidance emphasizes this same principle, warning that test data should be independent and that leakage between development and validation sets creates uncertainty about true performance.

The summary also describes how truth was established: positive cases were biopsy-proven, benign cases were confirmed by biopsy result or imaging follow-up, and negative cases were verified by imaging follow-up. That level of truthing detail helps reinforce the credibility of the validation framework.

Transparency and labeling lessons

MammoScreen 4 also reflects many of the transparency features FDA is now emphasizing. The summary clearly describes the intended use, intended users, outputs, integration approach, and the complementary rather than autonomous role of the software. It explains that the device is software-only, uses AI and machine learning techniques including deep learning modules, and provides outputs that can be displayed through a dedicated user interface or integrated into DICOM viewers and reporting systems.

This is the type of contextual disclosure that supports safer implementation. It gives clinicians and health systems a clearer picture of where the tool sits in workflow, what it contributes, and where human interpretation remains central.

Bias-related observations

The validation dataset described in the summary included age, race, and ethnicity distributions and considered these as subgroups in performance analysis. That is directionally consistent with FDA's recommendations for subgroup evaluation across patient characteristics and data acquisition conditions. At the same time, the reported population appears predominantly White, which illustrates an important real-world challenge for many AI imaging programs: subgroup reporting can be present, yet representation may still be uneven.

For future-facing sponsors, this is an important lesson. Transparency about representation is necessary, but it does not eliminate the need for broader dataset strategies and lifecycle monitoring to address residual uncertainty in underrepresented groups.

PCCP significance

The PCCP element is perhaps the most forward-looking aspect of the case. The cleared plan covered anticipated modifications related to extending support to additional mammography manufacturers, with specified validation activities, acceptance criteria, documentation updates, and a user communication plan that included advisory notices at least two weeks before deployment and user opt-out during that notice period.

This matters because it shows FDA's willingness to permit structured post-clearance evolution of AI-enabled devices when the sponsor predefines the modification types, data handling, re-training approach where needed, validation methods, and communication practices. For device makers, PCCPs may become a major strategic tool for balancing innovation speed with regulatory predictability.

In mammography, AUC (area under the receiver operating characteristic curve) is a performance metric used to measure how well a model distinguishes malignant from benign tissue. In general, AUC tends to improve as the evaluation target becomes less granular, moving from lesion-level detection to breast-level classification and then to exam-level classification.

In practice, these AUC values help compare how well AI tools support screening mammography at different points in the workflow, from lesion detection to full-exam interpretation.

Strategic takeaways for CAHIR Solutions clients

Several practical lessons emerge from the guidance and the MammoScreen 4 example.

Start with intended use, then design the evidence plan backward from that use case.
Treat data characterization and test-set independence as regulatory deliverables, not merely data science best practices.
Build transparency into the product and labeling architecture early, especially around inputs, outputs, limitations, subgroup performance, and workflow integration.
Use subgroup analysis not only to satisfy FDA expectations, but also to support more credible commercial discussions with health systems and clinicians.
Consider lifecycle planning, including PCCP strategy where appropriate, as part of initial product roadmap development rather than a post-clearance afterthought.

Closing perspective

FDA's latest guidance signals that the agency expects AI-enabled medical device submissions to reflect a mature systems view of product safety and effectiveness. Validation scope, design, data verification, transparency, and bias mitigation are not separate workstreams that can be stitched together late; they are interdependent parts of the evidentiary story that sponsors must tell clearly and credibly.

For manufacturers, this raises the bar, but it also creates a clearer path. Companies that operationalize these expectations early can build stronger submissions, reduce review friction, and position their products for more responsible market adoption. For clinicians and health systems, the guidance offers support for expecting better evidence, better labeling, and better lifecycle accountability from AI vendors. And for advisory firms focused on medtech commercialization and regulatory strategy, this is precisely the moment to help innovators translate promising models into approvable, trustworthy, and clinically usable products.

Why partner with CAHIR Solutions

Our regulatory strategists bring direct experience with FDA interactions, digital health submissions, AI/ML device clearances, and medtech commercialization. We understand that regulatory success is not just about getting cleared—it's about building products that clinicians trust, health systems adopt, and patients benefit from safely and equitably.

Whether you're an early-stage innovator preparing your first AI-enabled device submission, an established device manufacturer expanding into machine learning applications, or a digital health company navigating the CDS-device boundary, CAHIR Solutions provides the regulatory intelligence, strategic planning, and execution support you need to succeed in this rapidly evolving landscape.

Ready to advance your AI-enabled medical device program? Contact CAHIR Solutions to discuss how we can support your regulatory strategy, submission development, and pathway to market.

Let's build the next generation of safe, effective, and transparent AI-enabled healthcare innovation together.

2026 MedTech Regulatory Calendar

Regulatory Calendar

Agentic AI & Integration

A Force For Good