SQL Server Delete Duplicate Rows
There can be two types of duplication of rows in a table
1. Entire row getting duplicated because there is no primary key or unique key.
2. Only primary key or unique key value is different, but remaining all values are same.
Scenario 1: Delete duplicate rows without primary key or unique key.
Let us create the following example.
create table customers1 (CustId Int, CustName Varchar(20), CustCity Varchar(20), Passport_Number Varchar(20)) go
Insert into customers1 Values(1, 'John', 'Paris', 'P123X78')
Insert into customers1 Values(2, 'Martin', 'London', 'L873X92')
Insert into customers1 Values(3, 'Smith', 'New York', 'N293Y99')
Insert into customers1 Values(1, 'John', 'Paris', 'P123X78') go
select * from customers1 go

We want remove one of the duplicate records of John.
By issuing the following summary query, we can see which see which records are duplicate.
select * from customers1 Group by Custid,CustName, CustCity, Passport_Number Having count(*) > 1

Now we will add this row to a local temporary table.
Select * into #Temp_customers1 from customers1 where 1 = 2 Insert into #Temp_customers1 select * from customers1 Group by Custid,CustName, CustCity, Passport_Number Having count(*) > 1
Now the situation is that the duplicate row is in the local temporary table. All we need to now is to delete records from main table customers1 as per matching custid of the local temporary table.
Delete from customers1 where custid in (select Custid from #Temp_customers1)
Will the above query work? Not entirely, as by using the above query, we lost all the duplicate records!! Let us see the table again.
select * from customers1 go
Now to keep one record of John, we will take help of the local temporary table again. Let us add the same record from temporary table into customers1 table.
Insert into Customers1 select * from #Temp_customers1 go
Finally we got a single record of John at the end. Let us confirm by seeing the Customers1 table.
select * from customers1 go

Once done, we can drop the local temporary table.
Scenario 2: Delete duplicate rows where primary key or unique key value is different but remaining values are same.
Let us create the following example.
create table customers2 (CustId Int Primary Key, CustName Varchar(20), CustCity Varchar(20), Passport_Number Varchar(20)) go
Insert into customers2 Values(1, 'John', 'Paris', 'P123X78')
Insert into customers2 Values(2, 'Martin', 'London', 'L873X92')
Insert into customers2 Values(3, 'Smith', 'New York', 'N293Y99')
Insert into customers2 Values(4, 'John', 'Paris', 'P123X78')
Insert into customers2 Values(5, 'John', 'Paris', 'P123X78')
select * from customers2 go
Here is the same customer’s record, but this time John’s record has been added thrice with different customer ids but same Passport number!

Scenario 2.a: Delete Duplicate rows but keep one using CTE
We need to use the technique of Self Join initially to check for duplicate records containing different custid but same passport number.
select distinct a.* from customers2 a join customers2 b on a.custid <> b.custid and a.CustName = b.CustName and a.CustCity = b.CustCity and a.Passport_Number = b.Passport_Number

Now we have realized that custid 1, 4 & 5 are duplicate. The self-join statement accompanied by delete statement will give us the desired output of keeping the last duplicate record by eliminating all the previous duplicate records. We will use the Common Table Expression (CTE) and put the Self Join query in it.
With Duplicates as (select distinct a.custid as Customer_ID from customers2 a join customers2 b on a.custid <> b.custid and a.CustName = b.CustName and a.CustCity = b.CustCity and a.Passport_Number = b.Passport_Number ) Delete from Customers2 where custid in (select Customer_ID from Duplicates) and custid <> (select max(Customer_ID) from Duplicates)
Let’s check which rows got deleted.
select * from customers2 go

Scenario 2.b: Delete all duplicate records but keep the first original one
Let’s first truncate the customers2 table and add the same rows again.
Truncate Table customers2 go
Insert into customers2 Values(1, 'John', 'Paris', 'P123X78')
Insert into customers2 Values(2, 'Martin', 'London', 'L873X92')
Insert into customers2 Values(3, 'Smith', 'New York', 'N293Y99')
Insert into customers2 Values(4, 'John', 'Paris', 'P123X78')
Insert into customers2 Values(5, 'John', 'Paris', 'P123X78') go
The only change in the sub query will be that we need to use min(CustomerID) instead of max(CustomerID).
So the query will be as follows.
With Duplicates as (select distinct a.custid as Customer_ID from customers2 a join customers2 b on a.custid <> b.custid and a.CustName = b.CustName and a.CustCity = b.CustCity and a.Passport_Number = b.Passport_Number ) Delete from Customers2 where custid in (select Customer_ID from Duplicates) and custid <> (select min(Customer_ID) from Duplicates)
Let us confirm this in the customers2 table.
select * from customers2 go

And that’s how we can delete duplicate records in SQL Server with tables without primary key, containing primary key and by keeping one original row.
原文链接:http://www.codesec.net/view/449563.html
SQL Server Delete Duplicate Rows的更多相关文章
- SQL Server窗口函数:ROWS与RANGE
几乎每次我展示SQL Server里的窗口时,人们都非常有兴趣知道,当你定义你的窗口(指定的一组行)时,ROWS与RANGE选项之间的区别.因此在今天的文章里我想给你展示下这些选项的区别,对于你的分析 ...
- SQL SERVER – Count Duplicate Records – Rows
SELECT YourColumn, COUNT(*) TotalCount FROM YourTable GROUP BY YourColumn HAVING COUNT(*) > 1 ORD ...
- leetcode【sql】 Delete Duplicate Emails
Write a SQL query to delete all duplicate email entries in a table named Person, keeping only unique ...
- [SQL]196. Delete Duplicate Emails
Write a SQL query to delete all duplicate email entries in a table named Person, keeping only unique ...
- SQL Server窗口框架——ROWS、RANGE
说到窗口框架就不得不提起开窗函数. 开窗函数支持分区.排序和框架三种元素,其语法格式如下: OVER ( [ <PARTITION BY clause> ] [ <ORDER BY ...
- 记一次SQL Server delete语句的优化过程
今天测试反应问题,性能测试环境一个脚本执行了3个小时没有出结果,期间其他dba已经建立了一些索引但是没有效果. 语句: DELETE T from License T WHERE exists ( ...
- SQL Server delete、truncate、drop
在T-SQL中这三个命令符,相信很多朋友都不会陌生的,我自己在工作也会常常使用到它们,虽然我们清除的知道用这三个命令符可以达到怎样的预期效果. 但是却很少深入的去了解它们,知道它们有什么区别,又各有什 ...
- sql server Delete误操作后如何恢复数据
声明:本文是根据别人的经验https://blog.csdn.net/dba_huangzj/article/details/8491327写的总结 说明:update和delete时没有加where ...
- sql server delete语句
delete语句 --DELETE 语句用于删除表中的行 语法:delete from 表名称 where 列名称 = 值 --可以在不删除表的情况下删除所有的行.这意味着表的结构.属性和索引都是完整 ...
随机推荐
- Foundation框架—时间处理对象NSDate
NSDate类用于保存时间值,同时提供了一些方法来处理一些基于秒级别时差(Time Interval)运算和日期之间的早晚比较等. 1. NSDate的构造方法和构造获取实例的属性 用于创建NSDat ...
- BZOJ 3507 通配符匹配(贪心+hash或贪心+AC自动机)
首先可以对n个目标串单独进行处理. 对于每个目标串,考虑把模式串按'*'进行划分为cnt段.首尾两段一定得于原串进行匹配.剩下的cnt-2段尽量与最靠左的起点进行匹配. 对于剩下的cnt-2段.每段又 ...
- BZOJ 1050 旅行(并查集)
很好的一道题.. 首先把边权排序.然后枚举最小的边,再依次添加不小于该边的边,直到s和t联通.用并查集维护即可. # include <cstdio> # include <cstr ...
- BZOJ 1022 小约翰的游戏(anti-sg)
这是个anti-sg问题,套用sj定理即可解. SJ定理 对于任意一个Anti-SG游戏,如果定义所有子游戏的SG值为0时游戏结束,先手必胜的条件: 1.游戏的SG值为0且所有子游戏SG值均不超过1. ...
- Python 日志输出中添加上下文信息
Python日志输出中添加上下文信息 除了传递给日志记录函数的参数(如msg)外,有时候我们还想在日志输出中包含一些额外的上下文信息.比如,在一个网络应用中,可能希望在日志中记录客户端的特定信息,如: ...
- PKUWC2019 酱油记
目录 PKUWC2019 酱油记 day0 Day1 Day2 Day3 Day4 PKUWC2019 酱油记 day0 早上从镇中出发到栎社机场,然后才了解到原来充电宝电脑是必须随身(原以为必须托运 ...
- BZOJ2395:[Balkan 2011]Timeismoney——题解
https://www.lydsy.com/JudgeOnline/problem.php?id=2395 有n个城市(编号从0..n-1),m条公路(双向的),从中选择n-1条边,使得任意的两个城市 ...
- 无序数组中第Kth大的数
题目:找出无序数组中第Kth大的数,如{63,45,33,21},第2大的数45. 输入: 第一行输入无序数组,第二行输入K值. 该是内推滴滴打车时(2017.8.26)的第二题,也是<剑指of ...
- UVA.11464 Even Parity (思维题 开关问题)
UVA.11464 Even Parity (思维题 开关问题) 题目大意 给出一个n*n的01方格,现在要求将其中的一些0转换为1,使得每个方格的上下左右格子的数字和为偶数(如果存在的话),求使得最 ...
- Vue语法笔记
Vue.js 的核心是一个允许采用简洁的模板语法来声明式的将数据渲染进 DOM: 事件监听:v-on 指令绑定一个事件监听器 缩写[@] v-on:click 用户输入,绑定数据:v-model ...