mongodb组值由多个字段组成
例如,我有这些文件:
{ addr: 'address1' book: 'book1' }, { addr: 'address2' book: 'book1' }, { addr: 'address1' book: 'book5' }, { addr: 'address3' book: 'book9' }, { addr: 'address2' book: 'book5' }, { addr: 'address2' book: 'book1' }, { addr: 'address1' book: 'book1' }, { addr: 'address15' book: 'book1' }, { addr: 'address9' book: 'book99' }, { addr: 'address90' book: 'book33' }, { addr: 'address4' book: 'book3' }, { addr: 'address5' book: 'book1' }, { addr: 'address77' book: 'book11' }, { addr: 'address1' book: 'book1' }
等等。
我怎样才能提出请求,描述每个地址的前N个地址和前M个书籍?
预期结果示例:
address1 | book_1:5
| book_2:10
| book_3:50
| 总数:65
______________________
address2 | book_1:10
| book_2:10
| …
| book_M:10
| 总计:M * 10
…
______________________
addressN | book_1:20
| book_2:20
| …
| book_M:20
| 总计:M * 20
TLDR总结
在现代的MongoDB版本中,你可以用$slice
强制执行这个基本的聚合结果。 对于“大”的结果,运行并行查询,而不是每个分组,或等待SERVER-9377解决,这将允许一个“限制”项目数$push
到一个数组。
db.books.aggregate([ { "$group": { "_id": { "addr": "$addr", "book": "$book" }, "bookCount": { "$sum": 1 } }}, { "$group": { "_id": "$_id.addr", "books": { "$push": { "book": "$_id.book", "count": "$bookCount" }, }, "count": { "$sum": "$bookCount" } }}, { "$sort": { "count": -1 } }, { "$limit": 2 }, { "$project": { "books": { "$slice": [ "$books", 2 ] }, "count": 1 }} ])
MongoDB 3.6预览
仍然没有解决SERVER-9377 ,但在这个版本中, $lookup
允许一个新的“非相关”选项,它将"pipeline"
expression式作为参数,而不是"localFields"
和"foreignFields"
选项。 然后,这允许与另一个pipe道expression式进行“自连接”,其中我们可以应用$limit
以返回“top-n”结果。
db.books.aggregate([ { "$group": { "_id": "$addr", "count": { "$sum": 1 } }}, { "$sort": { "count": -1 } }, { "$limit": 2 }, { "$lookup": { "from": "books", "let": { "addr": "$_id" }, "pipeline": [ { "$match": { "$expr": { "$eq": [ "$addr", "$$addr"] } }}, { "$group": { "_id": "$book", "count": { "$sum": 1 } }}, { "$sort": { "count": -1 } }, { "$limit": 2 } ], "as": "books" }} ])
另一个额外的function当然是通过使用$match
在$expr
插入variables的能力来select“连接”中的匹配项目,但一般的前提是一个“pipe道内的pipe道”,内部的内容可以被过滤来自父母的匹配。 既然他们都是“pipe道”本身,我们可以分别$limit
每个结果。
这将是运行并行查询的下一个最佳select,如果$match
被允许,并且能够在“子pipe道”处理中使用索引,情况实际上会更好。 那么哪个是不会使用“限制到$push
”作为参考问题的要求,它实际上提供了一些应该更好地工作。
原始内容
你似乎已经偶然发现了最高的“N”问题。 在某种程度上,您的问题很容易解决,但不是要求您提供确切的限制:
db.books.aggregate([ { "$group": { "_id": { "addr": "$addr", "book": "$book" }, "bookCount": { "$sum": 1 } }}, { "$group": { "_id": "$_id.addr", "books": { "$push": { "book": "$_id.book", "count": "$bookCount" }, }, "count": { "$sum": "$bookCount" } }}, { "$sort": { "count": -1 } }, { "$limit": 2 } ])
现在会给你这样的结果:
{ "result" : [ { "_id" : "address1", "books" : [ { "book" : "book4", "count" : 1 }, { "book" : "book5", "count" : 1 }, { "book" : "book1", "count" : 3 } ], "count" : 5 }, { "_id" : "address2", "books" : [ { "book" : "book5", "count" : 1 }, { "book" : "book1", "count" : 2 } ], "count" : 3 } ], "ok" : 1 }
所以这与你所要求的不同,虽然我们得到地址值的最高结果,但底层的“书籍”select并不局限于所需的结果数量。
事实certificate,这是非常困难的事情,但可以完成,虽然复杂性随着您需要匹配的项目数量而增加。 为了保持简单,我们可以保持在最多2场比赛:
db.books.aggregate([ { "$group": { "_id": { "addr": "$addr", "book": "$book" }, "bookCount": { "$sum": 1 } }}, { "$group": { "_id": "$_id.addr", "books": { "$push": { "book": "$_id.book", "count": "$bookCount" }, }, "count": { "$sum": "$bookCount" } }}, { "$sort": { "count": -1 } }, { "$limit": 2 }, { "$unwind": "$books" }, { "$sort": { "count": 1, "books.count": -1 } }, { "$group": { "_id": "$_id", "books": { "$push": "$books" }, "count": { "$first": "$count" } }}, { "$project": { "_id": { "_id": "$_id", "books": "$books", "count": "$count" }, "newBooks": "$books" }}, { "$unwind": "$newBooks" }, { "$group": { "_id": "$_id", "num1": { "$first": "$newBooks" } }}, { "$project": { "_id": "$_id", "newBooks": "$_id.books", "num1": 1 }}, { "$unwind": "$newBooks" }, { "$project": { "_id": "$_id", "num1": 1, "newBooks": 1, "seen": { "$eq": [ "$num1", "$newBooks" ]} }}, { "$match": { "seen": false } }, { "$group":{ "_id": "$_id._id", "num1": { "$first": "$num1" }, "num2": { "$first": "$newBooks" }, "count": { "$first": "$_id.count" } }}, { "$project": { "num1": 1, "num2": 1, "count": 1, "type": { "$cond": [ 1, [true,false],0 ] } }}, { "$unwind": "$type" }, { "$project": { "books": { "$cond": [ "$type", "$num1", "$num2" ]}, "count": 1 }}, { "$group": { "_id": "$_id", "count": { "$first": "$count" }, "books": { "$push": "$books" } }}, { "$sort": { "count": -1 } } ])
所以这实际上会给你从前两个“地址”条目的前两名“书”。
但是对于我的钱来说,保持第一种forms,然后简单地“分片”返回的数组元素,以获取第一个“N”元素。
使用如下所示的聚合函数:
[ {$group: {_id : {book : '$book',address:'$addr'}, total:{$sum :1}}}, {$project : {book : '$_id.book', address : '$_id.address', total : '$total', _id : 0}} ]
它会给你如下结果:
{ "total" : 1, "book" : "book33", "address" : "address90" }, { "total" : 1, "book" : "book5", "address" : "address1" }, { "total" : 1, "book" : "book99", "address" : "address9" }, { "total" : 1, "book" : "book1", "address" : "address5" }, { "total" : 1, "book" : "book5", "address" : "address2" }, { "total" : 1, "book" : "book3", "address" : "address4" }, { "total" : 1, "book" : "book11", "address" : "address77" }, { "total" : 1, "book" : "book9", "address" : "address3" }, { "total" : 1, "book" : "book1", "address" : "address15" }, { "total" : 2, "book" : "book1", "address" : "address2" }, { "total" : 3, "book" : "book1", "address" : "address1" }
我没有完全得到你期望的结果格式,所以随意修改这个你需要的。