Creating Groupings Based on Unique Combinations of MEM_ID, PROV, and ADM_DT in SQL

Creating Groupings Based on Criteria

In this article, we will explore how to create groupings based on specific criteria. We will use a real-world example from Stack Overflow and break down the process into manageable steps.

The Problem

We are given a dataset with MEM_ID, CLM_ID, ADM_DT, DCHG_DT, and PROV columns. Our goal is to create groupings based on unique combinations of MEM_ID, PROV, and ADM_DT. We also need to group any additional events from the same MEM_ID and PROV if the DCHG_DT is the same as the ADM_DT or up to 1 day ahead.

The Expected Output

Here is an example of what the expected output should look like:

MEM_IDCLM_IDADM_DTDCHG_DTPROVGROUP
111101-01-202101-01-202111
111201-01-202102-01-202111
111301-01-202101-01-202111

The Solution

To solve this problem, we can use a combination of the PARTITION BY and ORDER BY clauses in SQL. We will first create a new column called ISSTART that checks if the current row’s values match the previous row’s values for MEM_ID, PROV, and ADM_DT. If they do, then it sets ISSTART to 0; otherwise, it sets ISSTART to 1.

Here is the SQL code that achieves this:

SELECT DISTINCT 
    MEM_ID,
    CLM_ID,
    ADM_DT,
    DCHG_DT,
    PROV,
    SUM(
        CASE 
            WHEN PROV = LAG(PROV) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT)
                AND (
                    ADM_DT = LAG(DCHG_DT) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT)
                )
            THEN 0
        END 
    ) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, CLM_ID) AS GROUP
FROM (
    SELECT DISTINCT 
        MEM_ID,
        PROV,
        CLM_ID,
        ADM_DT,
        DCHG_DT,
        CASE 
            WHEN PROV = LAG(PROV) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT)
                AND (
                    ADM_DT = LAG(DCHG_DT) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT)
                )
            THEN 0
        END AS ISSTART
    FROM c1
) t
ORDER BY MEM_ID, PROV, GROUP, ADM_DT;

This will give us the desired output.

Note that we are using a subquery to calculate ISSTART and then aggregating it using the SUM function. We are partitioning by MEM_ID and ordering by ADM_DT and CLM_ID to ensure that we get the correct grouping.

The Alternative Solution

If you don’t have the days_between() function available, you can use a similar approach with conditional statements:

SELECT DISTINCT 
    MEM_ID,
    CLM_ID,
    ADM_DT,
    DCHG_DT,
    PROV,
    SUM(
        CASE 
            WHEN ADM_DT = LAG(ADM_DT) OVER (PARTITION BY MEM_ID, PROV ORDER BY ADM_DT, CLM_ID)
                OR DAYS_BETWEEN(ADM_DT, LAG(DCHG_DT) OVER (PARTITION BY MEM_ID, PROV ORDER BY ADM_DT, CLM_ID)) IN (0, 1)
            THEN 0
        END 
    ) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, CLM_ID) AS GROUP
FROM (
    SELECT DISTINCT 
        MEM_ID,
        PROV,
        CLM_ID,
        ADM_DT,
        DCHG_DT,
        CASE 
            WHEN ADM_DT = LAG(ADM_DT) OVER (PARTITION BY MEM_ID, PROV ORDER BY ADM_DT, CLM_ID)
                OR DAYS_BETWEEN(ADM_DT, LAG(DCHG_DT) OVER (PARTITION BY MEM_ID, PROV ORDER BY ADM_DT, CLM_ID)) IN (0, 1)
            THEN 0
        END AS ISSTART
    FROM c1
) t
ORDER BY MEM_ID, PROV, GROUP, ADM_DT;

This will also give us the desired output.

Conclusion

In this article, we explored how to create groupings based on specific criteria. We used a combination of SQL clauses and conditional statements to achieve the desired outcome. We also provided an alternative solution that doesn’t rely on a specific function like days_between().


Last modified on 2023-12-06