在SQL Server中查找重复的行

我有一个组织的SQL Server数据库,并有许多重复的行。 我想运行一个select语句来获取所有这些和欺骗的数量,但也返回与每个组织相关联的ID。

声明如下:

SELECT orgName, COUNT(*) AS dupes FROM organizations GROUP BY orgName HAVING (COUNT(*) > 1) 

会返回类似的东西

 orgName | dupes ABC Corp | 7 Foo Federation | 5 Widget Company | 2 

但是我也想抓住他们的ID。 有没有办法做到这一点? 也许就像一个

 orgName | dupeCount | id ABC Corp | 1 | 34 ABC Corp | 2 | 5 ... Widget Company | 1 | 10 Widget Company | 2 | 2 

原因是也有单独的用户链接到这些组织,我想统一他们(因此删除欺骗,使用户链接到同一组织,而不是愚弄组织)。 但我想手动部分,所以我没有任何东西搞砸了,但我仍然需要一个声明返回所有愚蠢组织的ID,所以我可以通过用户列表。

 select o.orgName, oc.dupeCount, o.id from organizations o inner join ( SELECT orgName, COUNT(*) AS dupeCount FROM organizations GROUP BY orgName HAVING COUNT(*) > 1 ) oc on o.orgName = oc.orgName 

您可以运行以下查询并使用max(id)查找重复项并删除这些行。

 SELECT orgName, COUNT(*), Max(ID) AS dupes FROM organizations GROUP BY orgName HAVING (COUNT(*) > 1) 

但是你将不得不运行这个查询几次。

你可以这样做:

 SELECT o.id, o.orgName, d.intCount FROM ( SELECT orgName, COUNT(*) as intCount FROM organizations GROUP BY orgName HAVING COUNT(*) > 1 ) AS d INNER JOIN organizations o ON o.orgName = d.orgName 

如果您只想返回可以删除的logging(每个logging都有一个),则可以使用:

 SELECT id, orgName FROM ( SELECT orgName, id, ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY id) AS intRow FROM organizations ) AS d WHERE intRow != 1 

编辑:SQL Server 2000没有ROW_NUMBER()函数。 相反,您可以使用:

 SELECT o.id, o.orgName, d.intCount FROM ( SELECT orgName, COUNT(*) as intCount, MIN(id) AS minId FROM organizations GROUP BY orgName HAVING COUNT(*) > 1 ) AS d INNER JOIN organizations o ON o.orgName = d.orgName WHERE d.minId != o.id 

标记为正确的解决scheme不适用于我,但我发现这个工作很好的答案: 在MySql中获取重复行的列表

 SELECT n1.* FROM myTable n1 INNER JOIN myTable n2 ON n2.repeatedCol = n1.repeatedCol WHERE n1.id <> n2.id 

你可以试试这个,对你最好

  WITH CTE AS ( SELECT *,RN=ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY orgName DESC) FROM organizations ) select * from CTE where RN>1 go 
 Select * from (Select orgName,id, ROW_NUMBER() OVER(Partition By OrgName ORDER by id DESC) Rownum From organizations )tbl Where Rownum>1 

因此,rowum> 1的logging将成为表中的重复logging。 按照第一组的分区划分,然后通过给予序列号序列化它们。 所以rownum> 1将是可以删除的重复logging。

 select column_name, count(column_name) from table_name group by column_name having count (column_name) > 1; 

Src: https : //stackoverflow.com/a/59242/1465252

 select a.orgName,b.duplicate, a.id from organizations a inner join ( SELECT orgName, COUNT(*) AS duplicate FROM organizations GROUP BY orgName HAVING COUNT(*) > 1 ) b on o.orgName = oc.orgName group by a.orgName,a.id 
 select orgname, count(*) as dupes, id from organizations where orgname in ( select orgname from organizations group by orgname having (count(*) > 1) ) group by orgname, id 

您有几种方法来selectduplicate rows

对于我的解决scheme,首先考虑这个表格

 CREATE TABLE #Employee ( ID INT, FIRST_NAME NVARCHAR(100), LAST_NAME NVARCHAR(300) ) INSERT INTO #Employee VALUES ( 1, 'Ardalan', 'Shahgholi' ); INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' ); INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' ); INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' ); INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' ); INSERT INTO #Employee VALUES ( 4, 'name3', 'lname3' ); 

第一个scheme

 SELECT DISTINCT * FROM #Employee; WITH #DeleteEmployee AS ( SELECT ROW_NUMBER() OVER(PARTITION BY ID, First_Name, Last_Name ORDER BY ID) AS RNUM FROM #Employee ) SELECT * FROM #DeleteEmployee WHERE RNUM > 1 SELECT DISTINCT * FROM #Employee 

Secound解决scheme:使用identity字段

 SELECT DISTINCT * FROM #Employee; ALTER TABLE #Employee ADD UNIQ_ID INT IDENTITY(1, 1) SELECT * FROM #Employee WHERE UNIQ_ID < ( SELECT MAX(UNIQ_ID) FROM #Employee a2 WHERE #Employee.ID = a2.ID AND #Employee.FIRST_NAME = a2.FIRST_NAME AND #Employee.LAST_NAME = a2.LAST_NAME ) ALTER TABLE #Employee DROP COLUMN UNIQ_ID SELECT DISTINCT * FROM #Employee 

所有解决scheme的结束都使用这个命令

 DROP TABLE #Employee 

如果你想删除重复项:

 WITH CTE AS( SELECT orgName,id, RN = ROW_NUMBER()OVER(PARTITION BY orgName ORDER BY Id) FROM organizations ) DELETE FROM CTE WHERE RN > 1 

我想我知道你需要什么我需要混合的答案,我想我得到了他想要的解决scheme:

 select o.id,o.orgName, oc.dupeCount, oc.id,oc.orgName from organizations o inner join ( SELECT MAX(id) as id, orgName, COUNT(*) AS dupeCount FROM organizations GROUP BY orgName HAVING COUNT(*) > 1 ) oc on o.orgName = oc.orgName 

有最大的id会给你dublicate的id和原来的那个是他所要求的:

 id org name , dublicate count (missing out in this case) id doublicate org name , doub count (missing out again because does not help in this case) 

只有悲伤的事情,你把它拿出来在这种forms

 id , name , dubid , name 

希望它仍然有帮助

 select * from [Employees] 

查找重复logging1)使用CTE

 with mycte as ( select Name,EmailId,ROW_NUMBER() over(partition by Name,EmailId order by id) as Duplicate from [Employees] ) select * from mycte 

2)通过使用GroupBy

 select Name,EmailId,COUNT(name) as Duplicate from [Employees] group by Name,EmailId 

尝试

 SELECT orgName, id, count(*) as dupes FROM organizations GROUP BY orgName, id HAVING count(*) > 1;