删除没有主键的SQL表中的重复logging

我有下面的表格中的下面的logging

create table employee ( EmpId number, EmpName varchar2(10), EmpSSN varchar2(11) ); insert into employee values(1, 'Jack', '555-55-5555'); insert into employee values (2, 'Joe', '555-56-5555'); insert into employee values (3, 'Fred', '555-57-5555'); insert into employee values (4, 'Mike', '555-58-5555'); insert into employee values (5, 'Cathy', '555-59-5555'); insert into employee values (6, 'Lisa', '555-70-5555'); insert into employee values (1, 'Jack', '555-55-5555'); insert into employee values (4, 'Mike', '555-58-5555'); insert into employee values (5, 'Cathy', '555-59-5555'); insert into employee values (6 ,'Lisa', '555-70-5555'); insert into employee values (5, 'Cathy', '555-59-5555'); insert into employee values (6, 'Lisa', '555-70-5555'); 

我没有在这个表中的任何主键,但我已经在我的表上面的logging。 我想删除EmpId和EmpSSN字段中具有相同值的重复logging。

例如:Emp ID 5

任何人都可以帮我框架查询删除这些重复的logging

提前致谢

添加一个主键(下面的代码)

运行正确的删除(下面的代码)

想想为什么你不想保留这个主键。


假设MSSQL或兼容:

 ALTER TABLE Employee ADD EmployeeID int identity(1,1) PRIMARY KEY; WHILE EXISTS (SELECT COUNT(*) FROM Employee GROUP BY EmpID, EmpSSN HAVING COUNT(*) > 1) BEGIN DELETE FROM Employee WHERE EmployeeID IN ( SELECT MIN(EmployeeID) as [DeleteID] FROM Employee GROUP BY EmpID, EmpSSN HAVING COUNT(*) > 1 ) END 

这很简单。 我在SQL Server 2008中试过

 DELETE SUB FROM (SELECT ROW_NUMBER() OVER (PARTITION BY EmpId, EmpName, EmpSSN ORDER BY EmpId) cnt FROM Employee) SUB WHERE SUB.cnt > 1 

使用行号来区分重复的logging。 保留EmpID / EmpSSN的第一行号并删除其余的:

  DELETE FROM Employee a WHERE ROW_NUMBER() <> ( SELECT MIN( ROW_NUMBER() ) FROM Employee b WHERE a.EmpID = b.EmpID AND a.EmpSSN = b.EmpSSN ) 
 With duplicates As (Select *, ROW_NUMBER() Over (PARTITION by EmpID,EmpSSN Order by EmpID,EmpSSN) as Duplicate From Employee) delete From duplicates Where Duplicate > 1 ; 

这将更新表并删除表中的所有重复项!

您可以创build一个临时表#tempemployee其中包含select distinctemployee表。 然后delete from employee 。 然后insert into employee select from #tempemployee

就像Josh说的那样 – 即使你知道重复的内容 ,删除它们也将是无法实现的,因为如果它是另一个logging的完全重复,你实际上不能引用特定的logging。

 select distinct * into newtablename from oldtablename 

现在,新的newtablename名将不会有重复的logging。

只需在sql server的对象资源pipe理器中按F2来更改表名( newtablename )即可。

如果你不想创build一个新的主键,你可以使用SQL Server中的TOP命令:

 declare @ID int while EXISTS(select count(*) from Employee group by EmpId having count(*)> 1) begin select top 1 @ID = EmpId from Employee group by EmpId having count(*) > 1 DELETE TOP(1) FROM Employee WHERE EmpId = @ID end 

 DELETE DUP FROM ( SELECT ROW_NUMBER() OVER (PARTITION BY Clientid ORDER BY Clientid ) AS Val FROM ClientMaster ) DUP WHERE DUP.Val > 1 

说明

使用内部查询来构build一个视图,该视图包含一个基于Row_Number()的字段,并由希望是唯一的列进行分区。

从这个内部查询的结果中删除,select没有行号为1的任何东西; 即重复; 不是原来的。

有效的语法需要row_number窗口函数的order by子句; 你可以把任何列名称放在这里。 如果你想改变哪些结果被视为重复的(例如保留最早的或最近的等),那么在这里使用的列是重要的; 即你要指定的顺序,使你想保留的logging将在结果中第一。

ITS易于在查询下使用

 WITH Dups AS ( SELECT col1,col2,col3, ROW_NUMBER() OVER(PARTITION BY col1,col2,col3 ORDER BY (SELECT 0)) AS rn FROM mytable ) DELETE FROM Dups WHERE rn > 1 

我不是一个SQL专家,所以忍受着我。 我相信你会很快得到一个更好的答案。 以下是如何find重复的logging。

 select t1.empid, t1.empssn, count(*) from employee as t1 inner join employee as t2 on (t1.empid=t2.empid and t1.empssn = t2.empssn) group by t1.empid, t1.empssn having count(*) > 1 

删除它们会更棘手,因为在删除语句中可以使用的数据中没有任何内容来区分重复项。 我怀疑答案会涉及row_number()或添加一个标识列。

 创build唯一的聚集索引Employee_idx
员工(EmpId,EmpSSN)
与ignore_dup_key 

如果你不需要,你可以放弃索引。

没有ID ,没有rowcount()或没有temp table需要….

 WHILE ( SELECT COUNT(*) FROM TBLEMP WHERE EMPNO IN (SELECT empno from tblemp group by empno having count(empno)>1)) > 1 DELETE top(1) FROM TBLEMP WHERE EMPNO IN (SELECT empno from tblemp group by empno having count(empno)>1) 

在表ID和名称中有两列,其中名称用不同的ID重复,以便您可以使用此查询:。 。

 DELETE FROM dbo.tbl1 WHERE id NOT IN ( Select MIN(Id) AS namecount FROM tbl1 GROUP BY Name ) 

有一个没有主键的数据库表是真的,会说非常不好的做法…所以,你添加一个(ALTER TABLE)

运行这个,直到你没有看到更多的重复logging(这是HAVING COUNT的目的)

 DELETE FROM [TABLE_NAME] WHERE [Id] IN ( SELECT MAX([Id]) FROM [TABLE_NAME] GROUP BY [TARGET_COLUMN] HAVING COUNT(*) > 1 ) SELECT MAX([Id]),[TABLE_NAME], COUNT(*) AS dupeCount FROM [TABLE_NAME] GROUP BY [TABLE_NAME] HAVING COUNT(*) > 1 

MAX([Id])会导致删除最新的logging(在第一次创build后添加的logging),如果您想要删除第一条logging并保留最后一条logging,请使用MIN([Id])

 select t1.* from employee t1, employee t2 where t1.empid=t2.empid and t1.empname = t2.empname and t1.salary = t2.salary group by t1.empid, t1.empname,t1.salary having count(*) > 1 
 DELETE FROM 'test' USING 'test' , 'test' as vtable WHERE test.id>vtable.id and test.common_column=vtable.common_column 

使用这个我们可以删除重复logging

  ALTER IGNORE TABLEtesting
            ADD UNIQUE INDEX'test'('b'); 

@这里'b'是唯一性的列名,@这里'test'是索引名。