Thursday, October 28, 2010

Using Ranking Functions to Deduplicate Data




The introduction of ranking functions in SQL Server 2005 allowed the generation of listings with generated numbers based on sort orders providing keys such as row numbers and rank. These can be used for the deduplication of data in a simple way.
 


Consider the following simple example it belows:
--Step1-- --select table data ----
SELECT ENo, Ename
, Email,[Role]
FROM Duplicate










--Step2-----added Row_number function for my table-----
SELECT ROW_NUMBER() OVER (ORDER BY ENo) AS RowNumber
, ENo
, EName
FROM Duplicate


--Step3-----added Rank and Row_number function for my table-----
SELECT RANK() OVER(ORDER BY ENO) AS Rank,
ROW_NUMBER()OVER (ORDER BY Eno) AS RowNumber
, Eno
, Ename
FROM Duplicate ;


--Step4-----writing function for the temp table -------------
WITH RankCollection(Rank, RowNumber, Eno, Ename)
AS
(
SELECT RANK() OVER( ORDER BY ENO) AS Rank,

ROW_NUMBER() OVER ( ORDER BY Eno) AS RowNumber
, Eno
, Ename
FROM Duplicate
)
SELECT Eno
, Ename
FROM RankCollection
WHERE RANK = RowNumber




If you want more explanation refer this link