SQL Server:如何join到第一行
我会用一个具体的,但是假设的例子。
每个订单通常只有一个订单项 :
命令:
OrderGUID OrderNumber ========= ============ {FFB2...} STL-7442-1 {3EC6...} MPT-9931-8A
了LineItem:
LineItemGUID Order ID Quantity Description ============ ======== ======== ================================= {098FBE3...} 1 7 prefabulated amulite {1609B09...} 2 32 spurving bearing
但偶尔会有两个订单项的订单:
LineItemID Order ID Quantity Description ========== ======== ======== ================================= {A58A1...} 6,784,329 5 pentametric fan {0E9BC...} 6,784,329 5 differential girdlespring
通常在向用户显示订单时:
SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description FROM Orders INNER JOIN LineItems ON Orders.OrderID = LineItems.OrderID
我想在订单上显示单个项目。 但是,这种偶尔的订单包含两个(或更多)项目,订单将显示 重复 :
OrderNumber Quantity Description =========== ======== ==================== STL-7442-1 7 prefabulated amulite MPT-9931-8A 32 differential girdlespring KSG-0619-81 5 panametric fan KSG-0619-81 5 differential girdlespring
我真正想要的是让SQL Server select一个 ,因为这将足够好 :
OrderNumber Quantity Description =========== ======== ==================== STL-7442-1 7 prefabulated amulite MPT-9931-8A 32 differential girdlespring KSG-0619-81 5 panametric fan
如果我喜欢冒险,我可能会向用户展示一个省略号,表示有多个:
OrderNumber Quantity Description =========== ======== ==================== STL-7442-1 7 prefabulated amulite MPT-9931-8A 32 differential girdlespring KSG-0619-81 5 panametric fan, ...
所以问题是如何
- 消除“重复”行
- 只join其中的一行,以避免重复
第一次尝试
我第一次天真的尝试只是join到“ TOP 1 ”的项目中:
SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description FROM Orders INNER JOIN ( SELECT TOP 1 LineItems.Quantity, LineItems.Description FROM LineItems WHERE LineItems.OrderID = Orders.OrderID) LineItems2 ON 1=1
但是,这给了错误:
“Orders”列或前缀与查询中使用的表名或别名不匹配。
大概是因为内部select没有看到外部表。
SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description FROM Orders JOIN LineItems ON LineItems.LineItemGUID = ( SELECT TOP 1 LineItemGUID FROM LineItems WHERE OrderID = Orders.OrderID )
在SQL Server 2005
及更高版本中,您可以使用CROSS APPLY
replaceINNER JOIN
:
SELECT Orders.OrderNumber, LineItems2.Quantity, LineItems2.Description FROM Orders CROSS APPLY ( SELECT TOP 1 LineItems.Quantity, LineItems.Description FROM LineItems WHERE LineItems.OrderID = Orders.OrderID ) LineItems2
我知道这个问题刚刚回答,但是在处理大型数据集时,嵌套查询可能会很昂贵。 这是一个不同的解决scheme,其中嵌套的查询将只运行一次,而不是每个返回的行。
SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description FROM Orders INNER JOIN ( SELECT Orders.OrderNumber, Max(LineItem.LineItemID) AS LineItemID FROM Orders INNER JOIN LineItems ON Orders.OrderNumber = LineItems.OrderNumber GROUP BY Orders.OrderNumber ) AS Items ON Orders.OrderNumber = Items.OrderNumber INNER JOIN LineItems ON Items.LineItemID = LineItems.LineItemID
你可以这样做:
SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description FROM Orders INNER JOIN LineItems ON Orders.OrderID = LineItems.OrderID WHERE LineItems.LineItemID = ( SELECT MIN(LineItemID) FROM LineItems WHERE OrderID = Orders.OrderID )
这需要LineItems.LineItemID
上的索引(或主键)和LineItems.LineItemID
的索引,否则将会很慢。
@Quassnoi答案在某些情况下(尤其是如果外部表很大)时,答案会更好,使用窗口函数可能会更有效,如下所示:
SELECT Orders.OrderNumber, LineItems2.Quantity, LineItems2.Description FROM Orders LEFT JOIN ( SELECT LineItems.Quantity, LineItems.Description, OrderId, ROW_NUMBER() OVER (PARTITION BY OrderId ORDER BY (SELECT NULL)) AS RowNum FROM LineItems ) LineItems2 ON LineItems2.OrderId = Orders.OrderID And RowNum = 1
有时你只需要testing哪个查询可以提供更好的性能。
相关的子查询是依赖于外部查询的子查询。 这就像SQL中的for循环。 子查询将为外部查询中的每一行运行一次:
select * from users join widgets on widgets.id = ( select id from widgets where widgets.user_id = users.id order by created_at desc limit 1 )
编辑:没关系,Quassnoi有一个更好的答案。
对于SQL2K,如下所示:
SELECT Orders.OrderNumber , LineItems.Quantity , LineItems.Description FROM ( SELECT Orders.OrderID , Orders.OrderNumber , FirstLineItemID = ( SELECT TOP 1 LineItemID FROM LineItems WHERE LineItems.OrderID = Orders.OrderID ORDER BY LineItemID -- or whatever else ) FROM Orders ) Orders JOIN LineItems ON LineItems.OrderID = Orders.OrderID AND LineItems.LineItemID = Orders.FirstLineItemID
试过了十字架,效果很好,但稍微长一些。 调整好的行列有最大和最多的组,保持速度并且删除额外的logging。
这是调整后的查询:
SELECT Orders.OrderNumber, max(LineItems.Quantity), max(LineItems.Description) FROM Orders INNER JOIN LineItems ON Orders.OrderID = LineItems.OrderID Group by Orders.OrderNumber
我通过使用LEFT JOIN和GROUP BY Orders.OrderNumber来解决类似的问题。 有没有这样做的理由?
SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description FROM Orders LEFT JOIN LineItems ON Orders.OrderID = LineItems.OrderID GROUP BY Orders.OrderNumber
我会用你自己的问题回答你的回答问题:
Orders LineItems +-------------+ +---------+----------+---------------+ | OrderNumber | | OrderID | Quantity | Description | +-------------+ +---------+----------+---------------+ | 22586 | | 22586 | 17 | Trunion | +-------------+ | 22586 | 3 | Girdle Spring | +---------+----------+---------------+
将OrderNumber中的两个连接在一起给出:
OrderNumber Quantity Description ----------- -------- ------------- 22586 17 Trunion 22586 3 Girdle Spring 2 row(s) affected
我们希望它只返回一行:
OrderNumber Quantity Description ----------- -------- ------------- 22586 17 Trunion 1 row(s) affected
这就是为什么我使用GROUP BY Orders.OrderNumber,每个OrderNumber只返回一行。
我最喜欢的方式来运行这个查询是一个not exists子句。 我相信这是运行这种查询最有效的方法:
select o.OrderNumber, li.Quantity, li.Description from Orders as o inner join LineItems as li on li.OrderID = o.OrderID where not exists ( select 1 from LineItems as li_later where li_later.OrderID = o.OrderID and li_later.LineItemGUID > li.LineItemGUID )
但是我没有对这里提到的其他方法进行testing。
另一个使用公共表格expression的方法是:
with firstOnly as ( select Orders.OrderNumber, LineItems.Quantity, LineItems.Description, ROW_NUMBER() over (partiton by Orders.OrderID order by Orders.OrderID) lp FROM Orders join LineItems on Orders.OrderID = LineItems.OrderID ) select * from firstOnly where lp = 1
或者,最后也许你想显示所有行join?
逗号分隔版本在这里:
select * from Orders o cross apply ( select CAST((select l.Description + ',' from LineItems l where l.OrderID = s.OrderID for xml path('')) as nvarchar(max)) l ) lines