SQL Server:如何join到第一行

我会用一个具体的,但是假设的例子。

每个订单通常只有一个订单项

命令:

OrderGUID OrderNumber ========= ============ {FFB2...} STL-7442-1 {3EC6...} MPT-9931-8A 

了LineItem:

 LineItemGUID Order ID Quantity Description ============ ======== ======== ================================= {098FBE3...} 1 7 prefabulated amulite {1609B09...} 2 32 spurving bearing 

但偶尔会有两个订单项的订单:

 LineItemID Order ID Quantity Description ========== ======== ======== ================================= {A58A1...} 6,784,329 5 pentametric fan {0E9BC...} 6,784,329 5 differential girdlespring 

通常在向用户显示订单时:

 SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description FROM Orders INNER JOIN LineItems ON Orders.OrderID = LineItems.OrderID 

我想在订单上显示单个项目。 但是,这种偶尔的订单包含两个(或更多)项目,订单将显示 重复

 OrderNumber Quantity Description =========== ======== ==================== STL-7442-1 7 prefabulated amulite MPT-9931-8A 32 differential girdlespring KSG-0619-81 5 panametric fan KSG-0619-81 5 differential girdlespring 

我真正想要的是让SQL Server select一个 ,因为这将足够好

 OrderNumber Quantity Description =========== ======== ==================== STL-7442-1 7 prefabulated amulite MPT-9931-8A 32 differential girdlespring KSG-0619-81 5 panametric fan 

如果我喜欢冒险,我可能会向用户展示一个省略号,表示有多个:

 OrderNumber Quantity Description =========== ======== ==================== STL-7442-1 7 prefabulated amulite MPT-9931-8A 32 differential girdlespring KSG-0619-81 5 panametric fan, ... 

所以问题是如何

  • 消除“重复”行
  • 只join其中的一行,以避免重复

第一次尝试

我第一次天真的尝试只是join到“ TOP 1 ”的项目中:

 SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description FROM Orders INNER JOIN ( SELECT TOP 1 LineItems.Quantity, LineItems.Description FROM LineItems WHERE LineItems.OrderID = Orders.OrderID) LineItems2 ON 1=1 

但是,这给了错误:

“Orders”列或前缀与查询中使用的表名或别名不匹配。

大概是因为内部select没有看到外部表。

 SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description FROM Orders JOIN LineItems ON LineItems.LineItemGUID = ( SELECT TOP 1 LineItemGUID FROM LineItems WHERE OrderID = Orders.OrderID ) 

SQL Server 2005及更高版本中,您可以使用CROSS APPLYreplaceINNER JOIN

 SELECT Orders.OrderNumber, LineItems2.Quantity, LineItems2.Description FROM Orders CROSS APPLY ( SELECT TOP 1 LineItems.Quantity, LineItems.Description FROM LineItems WHERE LineItems.OrderID = Orders.OrderID ) LineItems2 

我知道这个问题刚刚回答,但是在处理大型数据集时,嵌套查询可能会很昂贵。 这是一个不同的解决scheme,其中嵌套的查询将只运行一次,而不是每个返回的行。

 SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description FROM Orders INNER JOIN ( SELECT Orders.OrderNumber, Max(LineItem.LineItemID) AS LineItemID FROM Orders INNER JOIN LineItems ON Orders.OrderNumber = LineItems.OrderNumber GROUP BY Orders.OrderNumber ) AS Items ON Orders.OrderNumber = Items.OrderNumber INNER JOIN LineItems ON Items.LineItemID = LineItems.LineItemID 

你可以这样做:

 SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description FROM Orders INNER JOIN LineItems ON Orders.OrderID = LineItems.OrderID WHERE LineItems.LineItemID = ( SELECT MIN(LineItemID) FROM LineItems WHERE OrderID = Orders.OrderID ) 

这需要LineItems.LineItemID上的索引(或主键)和LineItems.LineItemID的索引,否则将会很慢。

@Quassnoi答案在某些情况下(尤其是如果外部表很大)时,答案会更好,使用窗口函数可能会更有效,如下所示:

 SELECT Orders.OrderNumber, LineItems2.Quantity, LineItems2.Description FROM Orders LEFT JOIN ( SELECT LineItems.Quantity, LineItems.Description, OrderId, ROW_NUMBER() OVER (PARTITION BY OrderId ORDER BY (SELECT NULL)) AS RowNum FROM LineItems ) LineItems2 ON LineItems2.OrderId = Orders.OrderID And RowNum = 1 

有时你只需要testing哪个查询可以提供更好的性能。

相关的子查询是依赖于外部查询的子查询。 这就像SQL中的for循环。 子查询将为外部查询中的每一行运行一次:

 select * from users join widgets on widgets.id = ( select id from widgets where widgets.user_id = users.id order by created_at desc limit 1 ) 

编辑:没关系,Quassnoi有一个更好的答案。

对于SQL2K,如下所示:

 SELECT Orders.OrderNumber , LineItems.Quantity , LineItems.Description FROM ( SELECT Orders.OrderID , Orders.OrderNumber , FirstLineItemID = ( SELECT TOP 1 LineItemID FROM LineItems WHERE LineItems.OrderID = Orders.OrderID ORDER BY LineItemID -- or whatever else ) FROM Orders ) Orders JOIN LineItems ON LineItems.OrderID = Orders.OrderID AND LineItems.LineItemID = Orders.FirstLineItemID 

试过了十字架,效果很好,但稍微长一些。 调整好的行列有最大和最多的组,保持速度并且删除额外的logging。

这是调整后的查询:

 SELECT Orders.OrderNumber, max(LineItems.Quantity), max(LineItems.Description) FROM Orders INNER JOIN LineItems ON Orders.OrderID = LineItems.OrderID Group by Orders.OrderNumber 

我通过使用LEFT JOIN和GROUP BY Orders.OrderNumber来解决类似的问题。 有没有这样做的理由?

 SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description FROM Orders LEFT JOIN LineItems ON Orders.OrderID = LineItems.OrderID GROUP BY Orders.OrderNumber 

我会用你自己的问题回答你的回答问题:

 Orders LineItems +-------------+ +---------+----------+---------------+ | OrderNumber | | OrderID | Quantity | Description | +-------------+ +---------+----------+---------------+ | 22586 | | 22586 | 17 | Trunion | +-------------+ | 22586 | 3 | Girdle Spring | +---------+----------+---------------+ 

将OrderNumber中的两个连接在一起给出:

 OrderNumber Quantity Description ----------- -------- ------------- 22586 17 Trunion 22586 3 Girdle Spring 2 row(s) affected 

我们希望它只返回一行:

 OrderNumber Quantity Description ----------- -------- ------------- 22586 17 Trunion 1 row(s) affected 

这就是为什么我使用GROUP BY Orders.OrderNumber,每个OrderNumber只返回一行。

我最喜欢的方式来运行这个查询是一个not exists子句。 我相信这是运行这种查询最有效的方法:

 select o.OrderNumber, li.Quantity, li.Description from Orders as o inner join LineItems as li on li.OrderID = o.OrderID where not exists ( select 1 from LineItems as li_later where li_later.OrderID = o.OrderID and li_later.LineItemGUID > li.LineItemGUID ) 

但是我没有对这里提到的其他方法进行testing。

另一个使用公共表格expression的方法是:

 with firstOnly as ( select Orders.OrderNumber, LineItems.Quantity, LineItems.Description, ROW_NUMBER() over (partiton by Orders.OrderID order by Orders.OrderID) lp FROM Orders join LineItems on Orders.OrderID = LineItems.OrderID ) select * from firstOnly where lp = 1 

或者,最后也许你想显示所有行join?

逗号分隔版本在这里:

  select * from Orders o cross apply ( select CAST((select l.Description + ',' from LineItems l where l.OrderID = s.OrderID for xml path('')) as nvarchar(max)) l ) lines