Using Common Table Expressions (CTE) to Remove Duplicates

January 17, 2013 — Leave a comment

A Common Table Expressions (CTE) is:

Similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.

The following query uses a CTE to remove duplicate records. Keeping the latest of the duplicates based on their insertion Id. To accomplish this, a PARTITION is used to group duplicates together and they are ordered most recent to oldest. Using ROW_NUMBER we number the duplicates and delete anything that has a row number greater than 1.

WITH CTE AS 
( 
SELECT ROW_NUMBER() OVER 
(PARTITION BY [Email] 
 Order BY [Email], [Id] DESC) 
 AS RowNumber, [Email], [FirstName], [LastName] 
FROM Customers) 
DELETE FROM CTE Where RowNumber > 1

Common Table Expressions (CTE) can be useful in many situations

  • Creating recursive queries
  • As a building blocks for more complexes queries
  • Using the resulting table multiple times within the same statement
  • As Substitute for a view
  • When grouping by a column that the result of a function
  • When you need a dataset to reference itself

A CTE is made up of an expression name representing the CTE, an optional column list, and a query defining the CTE. After a CTE is defined, it can be referenced like a table or view can in a SELECT, INSERT, UPDATE, or DELETE statement. A CTE can also be used in a CREATE VIEW statement as part of its defining SELECT statement.

Beware of the following when using a CTE

  • A CTE must be followed by a single SELECT, INSERT, UPDATE, or DELETE statement that references some or all the CTE columns
  • Specifying more than one WITH clause in a CTE is not allowed
  • The following clauses cannot be used : ORDER BY (except when a TOP clause is specified) , INTO, OPTION, FOR XML, FOR BROWSE
  • When a statement that is part of a batch, the statement before it must be followed by a semicolon
  • Any hints that reference a CTE may conflict with other hints that are discovered when the CTE accesses its underlying tables
  • The recursive CTE definition must contain at least two CTE query definitions, an anchor member and a recursive member
  • An incorrectly composed recursive CTE may cause an infinite loop
  • The following clauses cannot be used if the CTE is recursive : ORDER BY , SELECT DISTINCT, PIVOT, HAVING, TOP, LEFT, RIGHT, OUTER JOIN
  • For recursive queries a WHILE loop may perform much better than the CTE

More Information

No Comments

Be the first to start the conversation!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.