删除没有主键的SQL表中的重复logging
我有下面的表格中的下面的logging
create table employee ( EmpId number, EmpName varchar2(10), EmpSSN varchar2(11) ); insert into employee values(1, 'Jack', '555-55-5555'); insert into employee values (2, 'Joe', '555-56-5555'); insert into employee values (3, 'Fred', '555-57-5555'); insert into employee values (4, 'Mike', '555-58-5555'); insert into employee values (5, 'Cathy', '555-59-5555'); insert into employee values (6, 'Lisa', '555-70-5555'); insert into employee values (1, 'Jack', '555-55-5555'); insert into employee values (4, 'Mike', '555-58-5555'); insert into employee values (5, 'Cathy', '555-59-5555'); insert into employee values (6 ,'Lisa', '555-70-5555'); insert into employee values (5, 'Cathy', '555-59-5555'); insert into employee values (6, 'Lisa', '555-70-5555');
我没有在这个表中的任何主键,但我已经在我的表上面的logging。 我想删除EmpId和EmpSSN字段中具有相同值的重复logging。
例如:Emp ID 5
任何人都可以帮我框架查询删除这些重复的logging
提前致谢
添加一个主键(下面的代码)
运行正确的删除(下面的代码)
想想为什么你不想保留这个主键。
假设MSSQL或兼容:
ALTER TABLE Employee ADD EmployeeID int identity(1,1) PRIMARY KEY; WHILE EXISTS (SELECT COUNT(*) FROM Employee GROUP BY EmpID, EmpSSN HAVING COUNT(*) > 1) BEGIN DELETE FROM Employee WHERE EmployeeID IN ( SELECT MIN(EmployeeID) as [DeleteID] FROM Employee GROUP BY EmpID, EmpSSN HAVING COUNT(*) > 1 ) END
这很简单。 我在SQL Server 2008中试过
DELETE SUB FROM (SELECT ROW_NUMBER() OVER (PARTITION BY EmpId, EmpName, EmpSSN ORDER BY EmpId) cnt FROM Employee) SUB WHERE SUB.cnt > 1
使用行号来区分重复的logging。 保留EmpID / EmpSSN的第一行号并删除其余的:
DELETE FROM Employee a WHERE ROW_NUMBER() <> ( SELECT MIN( ROW_NUMBER() ) FROM Employee b WHERE a.EmpID = b.EmpID AND a.EmpSSN = b.EmpSSN )
With duplicates As (Select *, ROW_NUMBER() Over (PARTITION by EmpID,EmpSSN Order by EmpID,EmpSSN) as Duplicate From Employee) delete From duplicates Where Duplicate > 1 ;
这将更新表并删除表中的所有重复项!
您可以创build一个临时表#tempemployee
其中包含select distinct
的employee
表。 然后delete from employee
。 然后insert into employee select from #tempemployee
。
就像Josh说的那样 – 即使你知道重复的内容 ,删除它们也将是无法实现的,因为如果它是另一个logging的完全重复,你实际上不能引用特定的logging。
select distinct * into newtablename from oldtablename
现在,新的newtablename
名将不会有重复的logging。
只需在sql server的对象资源pipe理器中按F2来更改表名( newtablename
)即可。
如果你不想创build一个新的主键,你可以使用SQL Server中的TOP命令:
declare @ID int while EXISTS(select count(*) from Employee group by EmpId having count(*)> 1) begin select top 1 @ID = EmpId from Employee group by EmpId having count(*) > 1 DELETE TOP(1) FROM Employee WHERE EmpId = @ID end
码
DELETE DUP FROM ( SELECT ROW_NUMBER() OVER (PARTITION BY Clientid ORDER BY Clientid ) AS Val FROM ClientMaster ) DUP WHERE DUP.Val > 1
说明
使用内部查询来构build一个视图,该视图包含一个基于Row_Number()
的字段,并由希望是唯一的列进行分区。
从这个内部查询的结果中删除,select没有行号为1的任何东西; 即重复; 不是原来的。
有效的语法需要row_number窗口函数的order by
子句; 你可以把任何列名称放在这里。 如果你想改变哪些结果被视为重复的(例如保留最早的或最近的等),那么在这里使用的列是重要的; 即你要指定的顺序,使你想保留的logging将在结果中第一。
ITS易于在查询下使用
WITH Dups AS ( SELECT col1,col2,col3, ROW_NUMBER() OVER(PARTITION BY col1,col2,col3 ORDER BY (SELECT 0)) AS rn FROM mytable ) DELETE FROM Dups WHERE rn > 1
我不是一个SQL专家,所以忍受着我。 我相信你会很快得到一个更好的答案。 以下是如何find重复的logging。
select t1.empid, t1.empssn, count(*) from employee as t1 inner join employee as t2 on (t1.empid=t2.empid and t1.empssn = t2.empssn) group by t1.empid, t1.empssn having count(*) > 1
删除它们会更棘手,因为在删除语句中可以使用的数据中没有任何内容来区分重复项。 我怀疑答案会涉及row_number()或添加一个标识列。
创build唯一的聚集索引Employee_idx 员工(EmpId,EmpSSN) 与ignore_dup_key
如果你不需要,你可以放弃索引。
没有ID
,没有rowcount()
或没有temp table
需要….
WHILE ( SELECT COUNT(*) FROM TBLEMP WHERE EMPNO IN (SELECT empno from tblemp group by empno having count(empno)>1)) > 1 DELETE top(1) FROM TBLEMP WHERE EMPNO IN (SELECT empno from tblemp group by empno having count(empno)>1)
在表ID和名称中有两列,其中名称用不同的ID重复,以便您可以使用此查询:。 。
DELETE FROM dbo.tbl1 WHERE id NOT IN ( Select MIN(Id) AS namecount FROM tbl1 GROUP BY Name )
有一个没有主键的数据库表是真的,会说非常不好的做法…所以,你添加一个(ALTER TABLE)
运行这个,直到你没有看到更多的重复logging(这是HAVING COUNT的目的)
DELETE FROM [TABLE_NAME] WHERE [Id] IN ( SELECT MAX([Id]) FROM [TABLE_NAME] GROUP BY [TARGET_COLUMN] HAVING COUNT(*) > 1 ) SELECT MAX([Id]),[TABLE_NAME], COUNT(*) AS dupeCount FROM [TABLE_NAME] GROUP BY [TABLE_NAME] HAVING COUNT(*) > 1
MAX([Id])会导致删除最新的logging(在第一次创build后添加的logging),如果您想要删除第一条logging并保留最后一条logging,请使用MIN([Id])
select t1.* from employee t1, employee t2 where t1.empid=t2.empid and t1.empname = t2.empname and t1.salary = t2.salary group by t1.empid, t1.empname,t1.salary having count(*) > 1
DELETE FROM 'test' USING 'test' , 'test' as vtable WHERE test.id>vtable.id and test.common_column=vtable.common_column
使用这个我们可以删除重复logging
ALTER IGNORE TABLEtesting ADD UNIQUE INDEX'test'('b');
@这里'b'是唯一性的列名,@这里'test'是索引名。