Standard operating procedures (SOPs) are essential documents in the context of regulations and compliance. SOPs outline specific steps for various processes, making sure practices are consistent, efficient, and compliant with regulatory standards.
SOP documents typically include key sections such as the title, scope, purpose, responsibilities, procedures, documentation, citations (references), and a detailed approval and revision history. In FDA-regulated industries such as healthcare and life sciences, SOPs play a crucial role in defining manufacturing, clinical, laboratory, quality control, quality assurance, and regulatory compliance practices.
When a regulatory body like the US Food and Drug Administration (FDA) introduces changes to regulations, organizations are required to evaluate the changes against their internal SOPs. When necessary, they must update their SOPs to align with the regulation changes and maintain compliance.
In this post, we show different approaches using Amazon Bedrock to identify relationships between regulation changes and SOPs.
Challenge
In the healthcare and life sciences industry, regulatory authorities like the FDA and the European Medicines Agency (EMA) frequently update regulations across various areas, such as clinical trials, medical devices, drug development and approvals, quality risk management, systems and data management, and technology adoption. These regulatory updates often require organizations to correspondingly update their internal SOPs to align with the changes. This process is typically manual, requiring a team of subject matter experts to review the regulatory changes, screen the SOPs to identify relevance, determine the impact, and specify what needs to be updated. This manual approach adds significant overhead for companies and can result in review cycles lasting several days to months.
To address this challenge, we explore approaches that can help automate the identification of relationships between regulatory changes and SOPs. These approaches can also be extended to assess the impact of regulatory changes on an organization’s internal processes and documentation. By using automation, companies can streamline the SOP update process, reducing the time and resources required to maintain alignment with evolving regulatory requirements.
Sample Data
For this post, we used SOPs published by the FDA’s Center for Biologics Evaluation and Research. These publicly available SOPs are used by the FDA staff to guide their duties.
Specifically, we focused on the following SOPs related to biologics procedures. This narrow scope allowed us to dive deeper into a specific regulatory domain within the larger healthcare and life sciences industry.
In addition to the SOPs, we also used three of the FDA’s Biologics Guidance Documents to test the relationship between the regulatory documents and the SOPs.
These guidance documents describe the FDA’s policy interpretations on regulatory issues related to the biologics domain. They cover a wide range of topics, including processing, content, evaluation, approval, inspection, and enforcement of policies. The guidance documents also discuss specific products or issues relating to the design, production, labeling, promotion, manufacturing, and testing of regulated products.
We used the following specific FDA Biologics Guidance Documents for this analysis:
Approaches
A key step in assessing the impact of regulatory changes is to identify if a regulatory guidance is related to an organization’s SOPs. We used Amazon Bedrock along with Amazon Simple Storage Service (Amazon S3) to store the input dataset.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
Our experiments used Anthropic’s Claude 3 Opus large language model (LLM) on Amazon Bedrock. However, you can use the broad selection of models available on Amazon Bedrock to experiment with alternative models and choose the one that best suits your specific requirements. Amazon Bedrock frequently releases updated versions of existing AI models that can be accessed and used by simply applying a configuration change, making it a highly flexible choice for deploying the latest AI capabilities.
We focused on the following approaches:
Full document match – Comparing the full text of the regulatory guidance and SOP documents
Text similarity – This approach consists of two options:
Vector embeddings – Measuring the semantic similarity between the guidance and SOP texts
Keyword Search – Identifying relevant keywords and their occurrences in the documents
Taxonomy topic match – Mapping the guidance and SOP content to a taxonomic structure to identify topical relationships
This post details the approaches we explored and the learnings from our experiments.
Full document match
The following diagram illustrates the full document match architecture.
In this approach, we compared each regulatory change to every SOP by passing the full contents of the SOP and the regulatory change to the model. The goal was to identify relationship between the regulatory change and the SOP.
The following is a sample prompt to check if an SOP is related to a regulation change:
When we ran the full document matching approach using Amazon Bedrock across all the SOPs and the regulatory guidance documents in the dataset, the results showed accurate identification of related SOPs. For example, SOPP 9151 was correctly identified as the only SOP related to the Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry regulation change, with others being identified as unrelated:
Similarly, SOPP 8005 was correctly identified as the only SOP related to the Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry regulation change, and the other SOPs were identified as unrelated.
Finally, SOP 8201 was also correctly identified as the only SOP related to the Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry regulation change.
These results demonstrate the effectiveness of the full document matching approach in accurately linking the relevant SOPs to their corresponding regulatory guidance documents.
Text similarity
The following diagram illustrates the text similarity match workflow.
In our second approach, we indexed the SOPs using either vector embeddings for semantic similarity or a keyword-based similarity approach. This allowed us to submit the contents of a regulatory change as a query and return the most similar SOP documents.
The steps involved in this text similarity approach are:
Index the SOPs:
For a vector embeddings approach, we generated vector representations of the SOP contents using an LLM to capture semantic similarities.
For a keyword-based approach, we identified the most relevant keywords in each SOP and built an index based on their occurrences.
Query the index:
For a given regulatory change, we submitted the text as a query to the SOP index.
The index then returned the most similar SOPs based on the chosen similarity metric (semantic or keyword-based).
Vector Search
For the text similarity approach, we used the open source in-memory database ChromaDB to generate the vector embeddings and perform the search.
We created a collection within ChromaDB containing all the SOP documents. We then independently queried each regulation guidance document text against this SOP collection. We used the default L2 distance algorithm, where a lower distance score indicates a closer match between the query and the indexed SOP documents.
Although the vector embedding-based text similarity approach identified the top matching SOP document in some cases, it also produced some inaccurate results.
For example, when querying with the Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry regulation, SOPP 9151 was correctly identified as the top match. However, a few other unrelated SOP documents also had low distance scores, which could potentially lead to them being misidentified as relevant:
Similarly, when querying with the Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry regulation, the vector search incorrectly identified SOPP 8717 as the best match, whereas SOPP 8005, which is more directly related to formal dispute resolution, had a higher distance score:
Finally, for the regulation Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry, the vector search again identified SOPP 8717 as the top match, rather than the more relevant SOP 8201: