在使用SQL Server的表中查找重复的logging
我正在validation具有电子商务网站的交易级别数据的表格,并find确切的错误。
我希望您的帮助能够在SQL Server的50列表中find重复的logging。
假设我的数据是:
OrderNo shoppername amountpayed city Item 1 Sam 10 A Iphone 1 Sam 10 A Iphone--->>Duplication to be detected 1 Sam 5 A Ipod 2 John 20 B Macbook 3 John 25 B Macbookair 4 Jack 5 A Ipod
假设我使用下面的查询:
Select shoppername,count(*) as cnt from dbo.sales having count(*) > 1 group by shoppername
会回报我
Sam 2 John 2
但我不想find超过1或2列的重复。 我想在我的数据中find所有列的重复。 我想要的结果是:
1 Sam 10 A Iphone
with x as (select *,rn = row_number() over(PARTITION BY OrderNo,item order by OrderNo) from #temp1) select * from x where rn > 1
你可以通过replaceselect语句来删除重复项
delete x where rn > 1
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as cnt FROM dbo.sales GROUP BY OrderNo, shoppername, amountPayed, city, item HAVING COUNT(*) > 1
SQL> SELECT JOB,COUNT(JOB) FROM EMP GROUP BY JOB; JOB COUNT(JOB) --------- ---------- ANALYST 2 CLERK 4 MANAGER 3 PRESIDENT 1 SALESMAN 4
只需将所有字段添加到查询中,并记住将它们添加到分组依据。
Select shoppername, a, b, amountpayed, item, count(*) as cnt from dbo.sales group by shoppername, a, b, amountpayed, item having count(*) > 1
要获取多条logging的列表,请使用以下命令
select field1,field2,field3, count(*) from table_name group by field1,field2,field3 having count(*) > 1
试试这个
SELECT MAX(shoppername), COUNT(*) AS cnt FROM dbo.sales GROUP BY CHECKSUM(*) HAVING COUNT(*) > 1
首先阅读CHECKSUM函数,因为可能有重复。
with x as ( select shoppername,count(shoppername) from sales having count(shoppername)>1 group by shoppername) select t.* from x,win_gp_pin1510 t where x.shoppername=t.shoppername order by t.shoppername
首先,我怀疑这个结果不准确吗? 好像从原来的桌子上有三个“山姆”。 但这个问题并不重要。
那么我们来这个问题本身。 根据您的表格,显示重复值的最佳方法是使用count(*)
和Group by
子句。 查询将如下所示
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as RepeatTimes FROM dbo.sales GROUP BY OrderNo, shoppername, amountPayed, city, item HAVING COUNT(*) > 1
原因是表中的所有列唯一标识了每个logging,这意味着只有当每列的所有值完全相同时,logging才会被认为是重复的,同时您也希望显示所有字段的重复logging,所以group by
不会错过任何列,否则是的,因为你只能select
参加'group by'条款的列。
现在我想给你任何例子With...Row_Number()Over(...)
,它是与Row_Number函数一起使用表格expression式。
假设你有一个几乎相同的表,但有一个额外的列称为发货date ,价值可能会改变,即使其余的是相同的。 这里是:
OrderNo shoppername amountpayed city Item Shipping Date
1 Sam 10 A Iphone 2016-01-01 1 Sam 10 A Iphone 2016-02-02 1 Sam 5 A Ipod 2016-03-03 2 John 20 B Macbook 2016-04-04 3 John 25 B Macbookair 2016-05-05 4 Jack 5 A Ipod 2016-06-06
请注意,如果您仍将所有列作为一个单元,则第2行不是重复的。 但是如果你想在这种情况下把它们看作是重复的呢? 您应该使用With...Row_Number()Over(...)
,查询如下所示:
WITH TABLEEXPRESSION AS (SELECT *,ROW_NUMBER() OVER (PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier) --if you consider the one with late shipping date as the duplicate FROM dbo.sales) SELECT * FROM TABLEEXPRESSION WHERE Identifier !=1 --or use '>1'
上面的查询会给出发货date的结果,例如:
OrderNo shoppername amountpayed city Item Shipping Date Identifier 1 Sam 10 A Iphone 2016-02-02 2
注意这个和2016-01-01不一样,2016-02-02被过滤的原因是PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier
,发货date不是需要照pipe重复logging的栏目之一,这意味着与2016-02-02的一个仍然可以是一个完美的结果为您的问题。
现在总结一下,使用count(*)
和Group by
子句是最好的select,当你只想显示来自Group by
子句的所有列作为结果时,否则你会错过不参与group by
的列。
虽然对于With...Row_Number()Over(...)
,它适用于你想要查找重复logging的每种情况,但是,将查询编写起来稍微复杂一些,一。
如果你的目的是从表中删除重复logging,你必须使用后面的WITH...ROW_NUMBER()OVER(...)...DELETE FROM...WHERE
之一。
希望这可以帮助!
尝试这个
with T1 AS ( SELECT LASTNAME, COUNT(1) AS 'COUNT' FROM Employees GROUP BY LastName HAVING COUNT(1) > 1 ) SELECT E.*,T1.[COUNT] FROM Employees E INNER JOIN T1 ON T1.LastName = E.LastName
select* from dbo.sales group by shoppername having(count(Item)> 1)
通过count( )> 1的EventID从dbo.EventInstances组中selectEventID,count( )作为cnt
以下是正在运行的代码:
SELECT abnno, COUNT(abnno) FROM tbl_Name GROUP BY abnno HAVING ( COUNT(abnno) > 1 )