Insights · Engineering

Building Policy Themes Using AI

How we built a system to create clinical taxonomies from healthcare policies using embeddings, clustering, and LLMs.

Introduction

At NOF1, we're always looking for ways to use AI in workflows to cut down research and synthesis time. We also primarily work with text-data, which means we don't always get to leverage typical statistical methods for our analysis.

In this article, we walk you through how we built a system to create clinical taxonomies from healthcare policies using embeddings, clustering, and a bit of LLM magic.

What's the Problem?

Imagine you have a set of healthcare policies, and you need to group them in a way that makes sense—like sorting them into themed categories. For instance:

Input Policies:

  • Policy 1: Hip surgery
  • Policy 2: Knee surgery
  • Policy 3: Ibuprofen

What You Want to See:

  • Theme 1: Muskuloskeletal related surgeries (based on Hip and Knee surgery)
  • Theme 2: Medications (based on Ibuprofen)

This grouping can come in handy in different healthcare-related scenarios, such as:

  • Spotting common threads among new policies for quick insights.
  • Organizing policies so customer workflows focus on specific areas.
  • Comparing similar policies side-by-side more easily.

… and really many non-healthcare related workflows.

Our Approach

The obvious one: Just Ask the LLM

Initially, we thought, "Why not just ask an LLM to do this?" But as pretty commonly we find, that's not going to be enough. Here's what we ran into:

  • Mixing Things Up: The model struggled to consistently group the policies the way we wanted. For example, often it would create themes that did not make sense or did not properly bucket the policies.
  • Too Many Policies: When the list got long, many models couldn't handle all the data in one go.
  • Token savings: LLMs tend to save on tokens and sometimes skip over policies when the data is too long, giving us incomplete results.

Next Up: Chunking

We then tried breaking the policies into smaller chunks. This helped a bit but ended up with themes that had to be reworked several times—again, which leads to loss of information and unpredicatable results.

Our Winning Combo: Embeddings, Clustering, and LLM Magic

  1. Step 1: Embeddings

    Embeddings are an underrated gem out of the LLM technology - essentially turning words into numbers. This step converts the text of your policies into numerical vectors that look similar for similar contexts. This opens up a lot of ways for us to figure out similar looking policies.

    In the figure below, we show the distance matrix of the embeddings of 10 policies. The darker the color, the more similar the policies. A few observations you'll notice - the distance already shows some interesting groupings. For example, the policies related to hip and knee surgeries are closer to each other than to the other policies, similarily with mental health related policies.


    NxN distance matrix of embeddings

    If you want to read more about embeddings, see the OpenAI and Gemini documentation

  2. Step 2: Clustering

    Now that we have the numbers, we need to group them. We used K-clustering (a version of K-means) to do just that. It's like sorting items into fixed buckets based on how similar they are. Read more.


    Clustering of 10 policies

    Why It Works:

    • It gives a predictable number of groups (themes).
    • It's straightforward and works well even with a lot of data.
  3. Step 3: LLM Magic

    The final step is where we bring in the LLM again, but this time with a much simpler ask. Instead of expecting it to group policies, we let it focus on summarizing each cluster. You just feed the model a group of policies and ask, "What's a good theme for these?" This makes the task much simpler for the LLM and leads to clearer, more accurate results.

    This cuts down on confusion and ensures the themes make sense based on the groupings.

    ... here is an example of the output of this step:
    ThemePolicies
    This policy cluster outlines non-invasive treatment protocols for managing pain and musculoskeletal conditions using both pharmaceutical (ibuprofen) and rehabilitative (physical and occupational therapy) approaches.
    1. Ibuprofen dosage guidelines
    2. Physical therapy guidelines
    3. PT/OT guidelines
    This cluster of policies focuses on emergency cardiovascular care, outlining guidelines and protocols for managing acute heart conditions such as heart attacks and cardiac arrests.
    1. Heart attack treatment guidelines
    2. Cardiac arrest policies
    This cluster of policies focuses on orthopedic joint surgeries, specifically addressing surgical interventions for hip and knee conditions.
    1. Hip surgery
    2. Knee surgery
    This cluster of policies focuses on comprehensive treatment guidelines and management strategies for chronic physical and behavioral/mental health conditions.
    1. Diabetes management guidelines
    2. Mental health treatment guidelines
    3. Behavioral health recommendations
    4. Personality disorders treatment guidelines

Why We Like This Approach

  • Breaking the process into clear steps makes it easier to manage and explain.
  • You know exactly what goes into each step, so it's easier to show clients exactly how the system works.
  • It's generalizable - it works for any situation where you need to sort items into themes, not just healthcare policies.

About Us

NOF1 is a payer policy intelligence platform across clinical and reimbursement policies. Our goal is to transform the way payers, providers, and other stakeholders navigate the complex landscape of healthcare policy to transform the way healthcare is delivered.

Our portfolio of products include:

For payers:

A competitive intelligence platform with over 10K+ clinical policies across payers, UM vendors and CMS. Designed for payers to assess their policy positioning, rapidly research alignment and differences relative to peers.

For providers:

An EMR platform that allows providers to understand clinical policy requirements at point of care, drastically improving documentation quality and compliance while reducing unnecessary denials.

For all stakeholders:

APIs that allow the retrieval of clinical policies in machine readable form and criteria to allow for integration into your enterprise software

To learn more, please reach out to ahmed@nofone.io