Skip to main content

How to Use CTE to Remove Duplicates

Removing duplicate records from a table is a common task in database management. One efficient and readable way to achieve this is by using Common Table Expressions (CTEs) in T-SQL. This article demonstrates how to use a CTE to identify and delete duplicate rows from a table.

Step 1: Create the Table

First, let’s create a table called Employee and populate it with some data, including duplicate rows:

CREATE TABLE Employee (name NVARCHAR(100), age INT, salary INT);

INSERT INTO Employee
VALUES 
    ('Mark', 42, 120000),
    ('Susan', 29, 125000),
    ('Mark', 42, 120000);

At this point, if you query the table using:

SELECT * FROM Employee;

You’ll see that the table contains duplicate rows:

name age salary
Mark 42 120000
Susan 29 125000
Mark 42 120000

Step 2: Use a CTE to Identify Duplicate Rows

We can use a CTE to assign a unique row number to each record. By partitioning the data based on name, age, and salary, and ordering by the same columns, we can easily identify duplicates. Here’s the query:

WITH CTE_Employee(name, age, salary, rownumber) AS (
    SELECT 
        name,   
        age,   
        salary,   
        ROW_NUMBER() OVER (
            PARTITION BY name, age, salary 
            ORDER BY name, age, salary
        ) AS rownumber
    FROM Employee
)

In this CTE:

  • PARTITION BY groups rows with the same name, age, and salary.
  • ROW_NUMBER() assigns a unique number to each row within a partition.

For duplicate records, all rows except the first one will have a rownumber greater than 1.

Step 3: Delete Duplicate Rows

Once the duplicates are identified, we can delete them by keeping only rows where rownumber is equal to 1:

DELETE
FROM CTE_Employee
WHERE rownumber <> 1;

This query removes all duplicate rows while retaining one instance of each unique record.

Step 4: Verify the Results

After running the deletion query, you can verify that the duplicates have been removed by querying the table again:

SELECT * FROM Employee;

You should now see only unique records in the Employee table.

Using CTEs to manage duplicate records in T-SQL is both powerful and readable. The ROW_NUMBER() function, combined with PARTITION BY, makes it easy to identify and handle duplicate rows efficiently. Try applying this approach to your datasets to simplify data cleansing tasks.

This Week's Best Picks from Amazon

Please see more curated items that we picked from Amazon here .

Popular posts from this blog

Exploring Sentiment Analysis Using Support Vector Machines

Sentiment analysis, a powerful application of Natural Language Processing (NLP), involves extracting opinions, attitudes, and emotions from textual data. It enables businesses to make data-driven decisions by analyzing customer feedback, social media posts, and other text-based interactions. Modern sentiment analysis has evolved from simple rule-based methods to advanced machine learning and deep learning approaches that detect subtle nuances in language. As text communication continues to dominate digital interactions, sentiment analysis is an essential tool for understanding public opinion and driving actionable insights. The GoEmotions Dataset The GoEmotions dataset, developed by Google Research, is a benchmark in emotion recognition. It consists of over 67,000 text entries labeled across 27 emotion categories, such as joy, anger, admiration, and sadness. For practical applications, these emotions can be grouped into broader categories like positive and negati...

Autonomous Vehicles and AI Integration

Autonomous vehicles (AVs) represent one of the most transformative innovations of modern technology. These vehicles leverage artificial intelligence (AI) technologies to perform tasks traditionally carried out by human drivers, such as navigation, obstacle avoidance, and traffic management. The integration of AI into autonomous vehicle designs has enabled advancements in safety, efficiency, and convenience. This paper examines the current state of technologies involved in AV development, emphasizing the role of AI in supporting various vehicle functions and passenger needs. Additionally, it provides an overview of key organizations driving advancements in this field. AI Technologies Underpinning Autonomous Vehicle Development Artificial intelligence is central to the operation of autonomous vehicles, providing the computational foundation for critical capabilities such as perception, decision-making, and control. These capabilities are achieved through the integration of multiple t...

Predicting Algerian Forest Fires Using Regression Models

Forest fires are a growing global concern, causing environmental damage, threatening biodiversity, and endangering human lives. In Algeria, the Bejaia and Sidi-Bel Abbes regions face heightened risk due to rising temperatures, dry conditions, and strong winds. Predictive models can help forecast fire risks based on environmental factors, enabling early intervention strategies. This blog explores the use of linear regression to predict the Fire Weather Index (FWI) and logistic regression to predict fire occurrences. Using the Algerian Forest Fires Dataset, we analyze how temperature and wind speed influence fire risks and occurrences. Dataset Overview The Algerian Forest Fires Dataset includes data from two regions: Bejaia and Sidi-Bel Abbes. Key variables include: Temperature Relative Humidity (RH) Wind Speed (Ws) Fire Weather Index (FWI) Fire occurrence class ("fire" or "not fire") The da...