我是MongoDB新手,我需要做一个聚合,这在我看来是相当困难的。一份文件看起来像这样
{
"_id" : ObjectId("568192aef8bd6b0cd0f649c6"),
"conference" : "IEEE International Conference on Acoustics, Speech and Signal Processing",
"prism:aggregationType" : "Conference Proceeding",
"children-id" : [
"SCOPUS_ID:84948148564",
"SCOPUS_ID:84927603733",
"SCOPUS_ID:84943521758",
"SCOPUS_ID:84905234683",
"SCOPUS_ID:84876113709"
],
"dc:identifier" : "SCOPUS_ID:84867598678"
}该示例只包含我在聚合中需要的字段。Prism:aggregationType可以有5种不同的价值观(会议进程、书籍、期刊等)。Children- ID 表示,该文档由一组其他文档引用(SCOPUS_ID是每个文档的唯一ID)。我想要做的是先按prism:aggregationType 分组,然后对每个会议进行分组,我想知道每个引用文档的数量($gt> 0)。
例如,让我们说,有100个文件,有会议从上面。这100份文件被250份文件引用。我想从所有这250个文件中知道有多少有“棱镜:聚合类型”:“会议进程”,“棱镜:聚合类型”:“日志”等等。输出可以如下所示:
{
"conference" : "IEEE International Conference on Acoustics, Speech and Signal Processing",
"aggregationTypes" : [{"Conference Proceeding" : 50} , {"Journal" : 200}]
}如果使用聚合管道或map-还原来完成,这并不重要。
编辑
是否有任何方法将这2合并成一个聚合:
db.articles.aggregate([
{ $match:{
conference : {$ne : null}
}},
{$unwind:'$children-id'},
{$group: {
_id: {conference: '$conference'},
'cited-by':{$push:{'dc:identifier':"$children-id"}}
}}
]);
db.articles.find( { 'dc:identifier': { $in: [ 'SCOPUS_ID:84943302953', 'SCOPUS_ID:84927603733'] } }, {'prism:aggregationType':1} );在查询中,我希望将$in中的数组替换为$push创建的数组。
发布于 2016-03-01 21:46:35
我在编辑部分中编写的代码也是我得出的最终结果(稍微修改了一下)。
db.articles.aggregate([
{ $match:{
conference : {$ne : null}
}},
{$unwind:'$children-id'},
{$group: {
_id: '$conference',
'cited-by':{$push:"$children-id"}
}}
]);
db.articles.find( { 'dc:identifier': { $in: [ 'SCOPUS_ID:84943302953', 'SCOPUS_ID:84927603733'] } }, {'prism:aggregationType':1} );每次会议的结果都是这样的:
{
"_id" : "Annual Conference on Privacy, Security and Trust",
"cited-by" : [
"SCOPUS_ID:84942789431",
"SCOPUS_ID:84928151617",
"SCOPUS_ID:84939229259",
"SCOPUS_ID:84946407175",
"SCOPUS_ID:84933039513",
"SCOPUS_ID:84942789431",
"SCOPUS_ID:84942607254",
"SCOPUS_ID:84948165954",
"SCOPUS_ID:84926379258",
"SCOPUS_ID:84946771354",
"SCOPUS_ID:84944223683",
"SCOPUS_ID:84942789431",
"SCOPUS_ID:84939169499",
"SCOPUS_ID:84947104346",
"SCOPUS_ID:84948764343",
"SCOPUS_ID:84938075139",
"SCOPUS_ID:84946196118",
"SCOPUS_ID:84930820238",
"SCOPUS_ID:84947785321",
"SCOPUS_ID:84933496680",
"SCOPUS_ID:84942789431"
]
}我遍历我得到的所有文档(大约250个),然后在$in中使用引用的-by数组。我使用索引对dc:标识符,所以它立即工作。$lookup可能是从聚合管道中完成任务的替代方法,但是R中的包不支持2.6以上的版本。无论如何,谢谢你抽出时间:)
发布于 2016-03-01 15:00:13
请通过aggregation试试这个
> db.collections
.aggregate([
// 1. get the size of `children-id` array through $project
{$project: {
conference: 1,
IEEE1: 1,
'prism:aggregationType': 1,
'children-id': {$size: '$children-id'}
}},
// 2. group by `conference` and `prism:aggregationType` and sum the size of `children-id`
{$group: {
_id: {
conference:'$conference',
aggregationType: '$prism:aggregationType'
},
ids: {$sum: '$children-id'}
}},
// 3. group by `conference`, and make pair of the conference processing ids size and journal ids size
{$group: {
_id: '$_id.conference',
aggregationTypes: {
$cond: [{$eq: ['$_id.aggregationType', 'Conference Proceeding']},
{$push: {"Conference Proceeding": '$ids'}},
{$push: {"Journal": '$ids'}}
]}
}}
]);发布于 2016-03-01 16:20:13
我们聊天的时候,
不幸的是,在聚合管道中使用$lookup绑定到MongoDB3.2,这不是一种情况,因为R驱动程序可以使用mongo2.6,而且源文档包含在多个集合中。
https://stackoverflow.com/questions/35724817
复制相似问题