文章/答案/技术大牛

发布

社区首页 >问答首页 >$push与$group结合使用

问$push与$group结合使用
EN

Stack Overflow用户

提问于 2016-11-29 14:37:14

回答 2查看 568关注 0票数 0

目标

将我的make_pipeline()函数修正为，使用聚合查询，计算每个用户的tweet数量，将其添加到数组中，并返回拥有最多tweet的5个用户。

练习

使用聚合查询，计算每个用户的tweet数量。在同一个$group阶段，使用$push为每个用户积累所有的tweet文本。

将你的输出限制在5个推特最多的用户。

您的结果文档应该只包括以下字段：

"_id" (用户的屏幕名称)，
"count" (为用户找到的tweet数量)，
"tweet_texts" (为用户找到的tweet文本列表)。

背景

为了实现先前的目标，我正在测试以下代码：

def make_pipeline():
    # complete the aggregation pipeline
    pipeline = [
        {"$group": {"_id": "$user.screen_name", "tweet_texts": {"$push": "$text"}, "count": {"$sum": 1}}},
        {"$project": {"_id": "$user.screen_name", "count": 1, "tweet_texts": 1}},
        {"$sort" : {"count" : -1}},
        {"$limit": 5}
    ]
    return pipeline

逻辑

首先，我通过username对所有的推特进行分组。然后，在同一阶段，我将所有发短信的推特推到tweet_texts上，并对分组的每一次事件进行计数。我相信这会让我拥有最多推特的用户数量。

然后进行投影，只选择我想要的三个字段：

_id
计数
tweet_texts

最后，我对结果的数量进行了排序和限制。

问题

我是通过了考试，但不是提交。我做错了什么？我现在的错误肯定在第一个阶段(小组)，但我找不到上帝的爱，我做错了什么。

数据样本

{
    "_id" : ObjectId("5304e2e3cc9e684aa98bef97"),
    "text" : "First week of school is over :P",
    "in_reply_to_status_id" : null,
    "retweet_count" : null,
    "contributors" : null,
    "created_at" : "Thu Sep 02 18:11:25 +0000 2010",
    "geo" : null,
    "source" : "web",
    "coordinates" : null,
    "in_reply_to_screen_name" : null,
    "truncated" : false,
    "entities" : {
        "user_mentions" : [ ],
        "urls" : [ ],
        "hashtags" : [ ]
    },
    "retweeted" : false,
    "place" : null,
    "user" : {
        "friends_count" : 145,
        "profile_sidebar_fill_color" : "E5507E",
        "location" : "Ireland :)",
        "verified" : false,
        "follow_request_sent" : null,
        "favourites_count" : 1,
        "profile_sidebar_border_color" : "CC3366",
        "profile_image_url" : "http://a1.twimg.com/profile_images/1107778717/phpkHoxzmAM_normal.jpg",
        "geo_enabled" : false,
        "created_at" : "Sun May 03 19:51:04 +0000 2009",
        "description" : "",
        "time_zone" : null,
        "url" : null,
        "screen_name" : "Catherinemull",
        "notifications" : null,
        "profile_background_color" : "FF6699",
        "listed_count" : 77,
        "lang" : "en",
        "profile_background_image_url" : "http://a3.twimg.com/profile_background_images/138228501/149174881-8cd806890274b828ed56598091c84e71_4c6fd4d8-full.jpg",
        "statuses_count" : 2475,
        "following" : null,
        "profile_text_color" : "362720",
        "protected" : false,
        "show_all_inline_media" : false,
        "profile_background_tile" : true,
        "name" : "Catherine Mullane",
        "contributors_enabled" : false,
        "profile_link_color" : "B40B43",
        "followers_count" : 169,
        "id" : 37486277,
        "profile_use_background_image" : true,
        "utc_offset" : null
    },
    "favorited" : false,
    "in_reply_to_user_id" : null,
    "id" : NumberLong("22819398300")
}

请帮帮我！

python

mongodb

mongodb-query

pymongo

回答 2

Stack Overflow用户

回答已采纳

发布于 2016-12-05 09:27:01

阅读评论

读了这些评论，我发现

pipeline = [
        {"$group": {"_id": "$user.screen_name", "tweet_texts": {"$push": "$text"}, "count": {"$sum": 1}}},
        {"$project": {"_id": "$user.screen_name", "count": 1, "tweet_texts": 1}},
        {"$sort" : {"count" : -1}},
        {"$limit": 5}
    ]

实际上应改为：

pipeline = [ 
        {"$group": {"_id": "$user.screen_name", "tweet_texts": {"$push": "$text"}, "count": {"$sum": 1}}}, 
        {"$sort" : {"count" : -1}}, 
        {"$limit": 5}
    ]

为什么？

完整的答案和解释可以在答案中看到：

What is the difference between these two MongoDB queries?

故事的结论是，我错误地使用了$project阶段。这不仅是根本不需要的，而且它应该是幂等的。

{"$project": {"_id": "$_id", "count": 1, "tweet_texts": 1}},

我也强烈推荐他的回答：

https://stackoverflow.com/a/40970901/1337392

特别感谢

以下用户值得kudos++：

https://stackoverflow.com/users/122005/chridam

把我引向正确的道路！

票数 1

Stack Overflow用户

发布于 2016-12-05 09:24:22

$project步骤是多余的，因为$group管道已经生成了这三个字段，因此不需要进入之前的$project阶段。

正确的管道应该是

pipeline = [ 
    {
        "$group": {
            "_id": "$user.screen_name", 
            "tweet_texts": { "$push": "$text" }, 
            "count": { "$sum": 1 }
        }
    }, 
    { "$sort" : { "count" : -1 } }, 
    { "$limit": 5 } 
]

您的$project管道无法工作，因为以前的$group管道没有生成任何字段"$user.screen_name"，您试图将其用作$project管道中的_id字段。

但是，如果您希望包含$project步骤，那么工作管道应该如下所示：

pipeline = [ 
    {
        "$group": {
            "_id": "$user.screen_name", 
            "tweet_texts": { "$push": "$text" }, 
            "count": { "$sum": 1 }
        }
    }, 
    { "$project": { "count": 1, "tweet_texts": 1 } },
    { "$sort" : { "count" : -1 } }, 
    { "$limit": 5 } 
]

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/40868553

复制

相似问题

问$push与$group结合使用
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问$push与$group结合使用EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问$push与$group结合使用
EN