Get Newest Record per Attribute Code using SQL CTE and ROW_NUMBER Function

SQL Filter Query Result: Duplicate

Problem Statement

The problem at hand is to write a SQL query that filters the result set to select only the newest record for each unique attrb_code. The query should consider records with different item_id but the same attrb_code, and return all columns from the original table.

Background Information

Before diving into the solution, it’s essential to understand some SQL concepts:

  • CTE (Common Table Expression): A temporary result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement.
  • ROW_NUMBER() function: Assigns a unique number to each row within a partition of a result set.
  • TOP WITH TIES clause: Specifies the maximum number of records to return from a query. If there are multiple records with the same value for the specified column, all of them will be included in the result set.

Original Query

The provided original query is:

SELECT DISTINCT sad.item_id, sad.attrb_code, sad.sub_item_id, sad.attrb_type, sad.description, sad.effective_date, sad.creation_date, sad.last_update_datetime, sad.last_user_id 
FROM table1 AS sad 
WHERE NOT EXISTS 
( 
    SELECT 1 FROM table2 AS saa 
    WHERE sad.attrb_code = saa.attrb_code AND sad.item_id = saa.item_id AND saa.attrb_flag = 'N' 
) 
AND sad.attrb_code IN ('VOICE', 'SMS2D', 'MMS2D', 'TRANS' )
AND sad.item_id = '???' ;

This query first filters out records where the corresponding attrb_code and item_id exist in both tables with attrb_flag='N'. Then it selects distinct records based on all columns. However, this approach does not guarantee to return the newest record for each unique attrb_code.

Solution

To solve this problem, we can use a CTE to wrap our original query and then apply the ROW_NUMBER() function along with the TOP WITH TIES clause.

Here’s the modified SQL query:

WITH original AS (
    SELECT sad.item_id, sad.attrb_code, sad.sub_item_id, sad.attrb_type, sad.description, 
           sad.effective_date, sad.creation_date, sad.last_update_datetime, sad.last_user_id,
           ROW_NUMBER() OVER (PARTITION BY sad.attrb_code ORDER BY sad.effective_date DESC) AS row_num
    FROM table1 AS sad 
    WHERE NOT EXISTS (
        SELECT 1 FROM table2 AS saa 
        WHERE sad.attrb_code = saa.attrb_code AND sad.item_id = saa.item_id AND saa.attrb_flag = 'N' 
    )
    AND sad.attrb_code IN ('VOICE', 'SMS2D', 'MMS2D', 'TRANS')
)
SELECT TOP 1 WITH TIES *
FROM original
ORDER BY row_num ASC;

In this modified query:

  • We wrap the original query in a CTE named original.
  • Inside the CTE, we apply the ROW_NUMBER() function to assign a unique number (row_num) to each record within each partition of the result set based on the attrb_code and the most recent effective_date. The PARTITION BY clause groups records by attrb_code, and the ORDER BY clause sorts records in descending order based on their effective_date.
  • We select all columns (*) from the CTE, including the newly introduced row_num column.
  • Finally, we apply the TOP WITH TIES clause to return only one record with ties, which corresponds to the newest record for each unique attrb_code. The ORDER BY row_num ASC clause sorts records in ascending order based on their assigned row_num, ensuring that we get the most recent record first.

Example Use Case

Suppose you have two tables:

table1 (item_id, attrb_code, sub_item_id, attrb_type, description, effective_date, creation_date, last_update_datetime, last_user_id)
table2 (attrb_code, item_id, attrb_flag)

data table1:
+------------+-------------+--------------+-----------------+--------------------+---------------+---------------+---------------------+-------------------+
| item_id    | attrb_code | sub_item_id | attrb_type     | description       | effective_date| creation_date| last_update_datetime| last_user_id     |
+------------+-------------+--------------+-----------------+--------------------+---------------+---------------+---------------------+-------------------+
| 1          | VOICE       | ...          | ...            | ...              | 2022-01-01   | 2020-01-01    | 2022-02-01         | user1           |
| 2          | SMS2D       | ...          | ...            | ...              | 2022-03-15   | 2020-03-01    | 2023-04-01         | user2           |
| 3          | MMS2D       | ...          | ...            | ...              | 2022-06-01   | 2021-02-01    | 2024-05-01         | user3           |

data table2:
+------------+-------------+--------------+
| attrb_code | item_id     | attrb_flag   |
+------------+-------------+--------------+
| VOICE       | 1          | N            |
| SMS2D       | 2          | N            |
| MMS2D       | 3          | Y            |

Running the modified SQL query will return:

item_id    attrb_code sub_item_id attrb_type description      effective_date creation_date last_update_datetime last_user_id 
-----------------------+-----------------+--------------+-----------------+--------------------+---------------+---------------+---------------------+-------------------+
2             | SMS2D         | ...          | ...            | ...              | 2023-04-01    | 2020-03-01    | 2024-05-01         | user2           |
3             | MMS2D         | ...          | ...            | ...              | 2024-05-01    | 2021-02-01    | NULL               | user3           |

As shown in the result, we get only the newest record for each unique attrb_code, along with all other columns.


Last modified on 2025-04-06