What Annex 22 spells for AI in GMP manufacturing

No comments

Europe’s first dedicated framework for artificial intelligence in GMP manufacturing is close to finalisation and it is more demanding and more consequential than many manufacturers may currently appreciate.

digital medicine concept

Pharmaceutical manufacturers in Europe have great ambitions for AI: in a survey we ran with Forrester last year, more than 6 in 10 reported that they would be investing in AI for predictive maintenance (65%), AI-based natural language interfaces (61%) or machine learning (61%) in the next twelve months.

However, AI use in GMP environments has been somewhat of a regulatory grey zone, governed loosely by the general provisions of GMP Annex 11 on computerised systems, which was last substantively updated in 2011 and was never written with machine learning in mind.

That grey zone is closing with EudraLex Volume 4 - Annex 22, a draft of the first dedicated regulatory framework for artificial intelligence in GMP environments. The draft completed public consultation in late 2025 and finalisation is expected in 2026, though the precise publication date and enforcement phasing may yet shift.

Annex 22 explicitly leaves the door open for [LLM] use in non-critical contexts”

As drafted, Annex 22 covers static models: models whose parameters are fixed and which do not adapt during use. Dynamic models that continue to learn and adapt during live use are explicitly excluded from critical GMP applications and, in a provision that will disappoint enthusiasts of the current AI moment, so are generative AI and large language models for any GMP-critical decision. This is not a blanket ban on LLMs across the site, however. Annex 22 explicitly leaves the door open for their use in non-critical contexts, such as summarising deviation reports, searching standard operating procedures or drafting maintenance notes, provided that a qualified person remains in the loop, reviews the output and retains documented responsibility for any action taken.

Five commandments

Obligations are substantive and span the full lifecycle of an AI model from initial design through to ongoing operation and sit on top of existing Annex 11 obligations, not instead of them. Here are the five requirements with the most consequential impact on manufacturing operations

Define the intended use first. Before any acceptance testing begins, the intended use will need to be documented precisely along with the full range of inputs the model will encounter: common cases, edge cases, rare variations, potential errors and sources of bias. A process subject matter expert will need to approve this. If the model’s purpose and failure modes cannot be articulated, testing cannot begin.

Performance must not go backwards. Test metrics must be defined and approved before testing starts, not after the results are in. More pointedly, the model must perform at least as well as the process it replaces. For sites where the old process was never formally benchmarked, that work needs to happen first.

Keep test data clean and separate. Test data must be strictly separated from training and validation data, enforced by both technical and procedural controls. Staff who had access to test data cannot be involved in training the same model, unless paired with a colleague who did not. This four-eyes principle is borrowed from financial controls.

The model must show its working. The system must record which features drove each classification or decision, using tools such as SHAP values, LIME or heat maps, and a review of those features must form part of the approval process. The point is not just that the model gets the right answer but that it is doing so for the right reasons.

Change control applies to everything the model touches. Once deployed, any change to the model, the system it runs on or the physical objects it uses as input must be assessed for whether revalidation is needed. Confidence scores will need to be logged for each prediction where applicable and, when confidence is very low, it should be considered whether the model ought to return ‘undecided’ rather than force a classification it is not sure about.

How will this change things in practice?

The honest answer, for most European pharma manufacturers, is: considerably.

Under the existing framework, an AI model in a manufacturing context was typically governed as validated software under Annex 11. That meant documenting functionality, testing expected outputs and maintaining change control. What Annex 11 never required was any systematic treatment of training data, test data independence, model explainability, confidence scoring or input drift monitoring.

Expected implications on predictive maintenance

AI-assisted predictive maintenance is already widely deployed: models trained on vibration data, temperature profiles and process historian records identify degradation patterns and recommend intervention windows.

Under Annex 22, any such model used in a critical application with direct impact on product quality or data integrity is in scope, requiring a documented input sample space, predefined acceptance criteria, independent test data and ongoing performance monitoring. The shift in framing is significant: predictive maintenance in GMP is no longer simply an efficiency play or a way to reduce unplanned downtime. Under Annex 22, it becomes part of the quality and compliance system, subject to the same validation disciplines as any other GMP-critical process.

Equipment changes must be assessed against the model’s intended use. Replacing a centrifuge bearing can alter the input sample space and now requires formal evaluation. This creates a direct and previously non-existent link between asset lifecycle management and AI governance. Sensor replacements, calibration events, work order closures, configuration changes and the full maintenance history of every piece of equipment feeding data into an AI model all become part of the evidence base that must be maintained and made available for inspection.

A best-of-breed, connected asset lifecycle platform is no longer merely an operational advantage for engineering teams; it becomes structural infrastructure for AI governance”

AI compliance, in other words, will not be determined solely by how well a model was trained and validated. It will depend equally on the completeness and traceability of the asset history behind every input. A best-of-breed, connected asset lifecycle platform is no longer merely an operational advantage for engineering teams; it becomes structural infrastructure for AI governance.

Information management: no dataset left unturned

The governance requirements of Annex 22 build on revised GMP Chapter 4, which mandates risk-based data governance, audit trails and lifecycle traceability for electronic records. For AI systems, datasets, logs and model configuration files become GMP documentation.

Most AI models in manufacturing have been developed by data science or automation teams, with data sitting in operational technology systems, engineering databases and vendor-controlled cloud platforms. Bringing all of that under a coherent GMP document management framework requires structural integration between data science operations and the quality management system that most sites do not yet have.

Legacy software and practices: reassess and remediate or retire

Annex 11 already requires that computerised systems be evaluated against current requirements when significant changes occur. On most regulatory interpretations, the introduction of Annex 22 would itself constitute such a change for any in-scope AI system, meaning grandfathering existing deployments is unlikely to be an option, though the final guidance may include its own transition provisions.

Annex 22 places full responsibility for validation evidence on the regulated company”

The practical consequence is a portfolio review: identify all AI and ML components, assess which fall within Annex 22 scope, document the gaps and remediate or retire. For large sites with multiple vendor systems containing embedded ML, this is a substantial effort.

It is also worth noting that Annex 22 places full responsibility for validation evidence on the regulated company, regardless of whether the model was built in-house or supplied externally. There is no such thing as a vendor black box.

Are you ready? A practical starting checklist

The gap between current practice and Annex 22 compliance can feel abstract until it is mapped concretely. The following checklist is a useful first-pass diagnostic. It does not replace a formal gap assessment, but it will surface the pressure points quickly.

Inventory your AI and ML use cases. Do you have a complete register of every AI or ML component in use, including models embedded in vendor-supplied equipment or process control systems? If not, the portfolio review must come first.
Map each model to the assets and sensors that feed it. For every in-scope AI model, can you identify every physical asset, sensor and data source that constitutes its input? Changes to any of these require formal evaluation under Annex 22.
Define and document intended use. Is the intended use of each model documented, including its input sample space, known edge cases and failure modes? This document is the foundation of the entire validation programme.
Verify test-data separation. Is your test data demonstrably separate from training and validation data, with both technical controls and procedural evidence? This is one of the most common gaps in existing deployments.
Check that confidence scores are being logged. Annex 22 calls for confidence scores to be logged for each prediction where applicable. Is this already happening and are very low-confidence outputs being flagged appropriately rather than forcing a classification?
Monitor for drift. Do you have a mechanism to detect when a model’s input distribution has shifted, whether from equipment changes, process modifications or seasonal variation and to trigger revalidation when it does?
Audit your vendor AI. For every externally supplied system with embedded AI, do you have access to sufficient information about how the model was trained, validated and monitored to meet your own compliance obligations? Annex 22 places that burden on you, not the vendor.

The audit-ready backbone

What the checklist above makes clear is that AI governance under Annex 22 cannot be addressed by the data science team alone or resolved solely within the quality management system. It requires a connected view of asset history, maintenance evidence, sensor configuration, calibration records and GMP change control. That’s the kind of traceability that an enterprise asset management platform exists to provide.

[With Annex 22] every sensor swap, every bearing replacement, every calibration, every change order touching equipment whose data feeds an AI model is now, in effect, part of the AI compliance record”

This is the connection that has been largely absent from discussions of AI compliance in GMP environments: the link between the AI model and the physical world it is drawing inferences from. Annex 22 makes that link regulatory: every sensor swap, every bearing replacement, every calibration, every change order touching equipment whose data feeds an AI model is now, in effect, part of the AI compliance record. Maintaining that record requires an asset lifecycle platform that is not just operationally capable but audit-ready by design: one that connects engineering change control to AI governance, makes maintenance evidence traceable at the asset level and surfaces the configuration history an inspector will want to see.

The manufacturers who will navigate Annex 22 most effectively are those who start that integration now, before the guidance is finalised, and certainly before the first inspection that references it.

Clarity, and some recognition for doing it right

It would be easy to read Annex 22 as a burden and, in the short term for unprepared sites, it is. But the cleaner assessment is that it resolves a genuine regulatory ambiguity that has been causing its own costs.

For years, quality assurance teams have struggled to sign off on AI deployments because there was no clear regulatory template for what adequate validation looked like. Annex 22 provides an authoritative answer. Inspectors will have a shared reference point and companies will be able to design validation programmes with confidence that meeting the annex’s requirements will satisfy regulators.

The framework also creates a level playing field on quality. Sites that have invested in rigorous AI governance, with proper data management, explainability tooling, change control for models, will no longer be disadvantaged against competitors that deployed faster but with less discipline. The algorithm and the autoclave have been working alongside each other for years: Annex 22 finally sets the terms of that relationship.