Creating Groupings Based on Criteria
In this article, we will explore how to create groupings based on specific criteria. We will use a real-world example from Stack Overflow and break down the process into manageable steps.
The Problem
We are given a dataset with MEM_ID, CLM_ID, ADM_DT, DCHG_DT, and PROV columns. Our goal is to create groupings based on unique combinations of MEM_ID, PROV, and ADM_DT. We also need to group any additional events from the same MEM_ID and PROV if the DCHG_DT is the same as the ADM_DT or up to 1 day ahead.
The Expected Output
Here is an example of what the expected output should look like:
| MEM_ID | CLM_ID | ADM_DT | DCHG_DT | PROV | GROUP |
|---|---|---|---|---|---|
| 1 | 111 | 01-01-2021 | 01-01-2021 | 1 | 1 |
| 1 | 112 | 01-01-2021 | 02-01-2021 | 1 | 1 |
| 1 | 113 | 01-01-2021 | 01-01-2021 | 1 | 1 |
| … | … | … | … | … | … |
The Solution
To solve this problem, we can use a combination of the PARTITION BY and ORDER BY clauses in SQL. We will first create a new column called ISSTART that checks if the current row’s values match the previous row’s values for MEM_ID, PROV, and ADM_DT. If they do, then it sets ISSTART to 0; otherwise, it sets ISSTART to 1.
Here is the SQL code that achieves this:
SELECT DISTINCT
MEM_ID,
CLM_ID,
ADM_DT,
DCHG_DT,
PROV,
SUM(
CASE
WHEN PROV = LAG(PROV) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT)
AND (
ADM_DT = LAG(DCHG_DT) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT)
)
THEN 0
END
) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, CLM_ID) AS GROUP
FROM (
SELECT DISTINCT
MEM_ID,
PROV,
CLM_ID,
ADM_DT,
DCHG_DT,
CASE
WHEN PROV = LAG(PROV) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT)
AND (
ADM_DT = LAG(DCHG_DT) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT)
)
THEN 0
END AS ISSTART
FROM c1
) t
ORDER BY MEM_ID, PROV, GROUP, ADM_DT;
This will give us the desired output.
Note that we are using a subquery to calculate ISSTART and then aggregating it using the SUM function. We are partitioning by MEM_ID and ordering by ADM_DT and CLM_ID to ensure that we get the correct grouping.
The Alternative Solution
If you don’t have the days_between() function available, you can use a similar approach with conditional statements:
SELECT DISTINCT
MEM_ID,
CLM_ID,
ADM_DT,
DCHG_DT,
PROV,
SUM(
CASE
WHEN ADM_DT = LAG(ADM_DT) OVER (PARTITION BY MEM_ID, PROV ORDER BY ADM_DT, CLM_ID)
OR DAYS_BETWEEN(ADM_DT, LAG(DCHG_DT) OVER (PARTITION BY MEM_ID, PROV ORDER BY ADM_DT, CLM_ID)) IN (0, 1)
THEN 0
END
) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, CLM_ID) AS GROUP
FROM (
SELECT DISTINCT
MEM_ID,
PROV,
CLM_ID,
ADM_DT,
DCHG_DT,
CASE
WHEN ADM_DT = LAG(ADM_DT) OVER (PARTITION BY MEM_ID, PROV ORDER BY ADM_DT, CLM_ID)
OR DAYS_BETWEEN(ADM_DT, LAG(DCHG_DT) OVER (PARTITION BY MEM_ID, PROV ORDER BY ADM_DT, CLM_ID)) IN (0, 1)
THEN 0
END AS ISSTART
FROM c1
) t
ORDER BY MEM_ID, PROV, GROUP, ADM_DT;
This will also give us the desired output.
Conclusion
In this article, we explored how to create groupings based on specific criteria. We used a combination of SQL clauses and conditional statements to achieve the desired outcome. We also provided an alternative solution that doesn’t rely on a specific function like days_between().
Last modified on 2023-12-06