首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >优化MongoDB聚合管道(组、查找、匹配)

优化MongoDB聚合管道(组、查找、匹配)
EN

Stack Overflow用户
提问于 2020-03-01 06:35:33
回答 1查看 687关注 0票数 2

我是NoSQL数据库的新手,我选择MongoDB作为我的第一个NoSQL数据库。我创建了一个聚合管道来显示我想要的数据,下面是我的文档示例:

来自用户集合的文档示例

代码语言:javascript
复制
{
    "_id": 9,
    "name": "Sample Name",
    "email": "email@example.com",
    "password": "password hash"
}

页面集合中的文档示例(这个并不重要)

代码语言:javascript
复制
{
    "_id": 42,
    "name": "Product Name",
    "description": "Product Description",
    "user_id": 8,
    "rating_categories": [{
        "_id": 114,
        "name": "Build Quality"
    }, {
        "_id": 115,
        "name": "Price"
    }, {
        "_id": 116,
        "name": "Feature"
    }, {
        "_id": 117,
        "name": "Comfort"
    }, {
        "_id": 118,
        "name": "Switch"
    }]
}

来自评论集的文档示例

代码语言:javascript
复制
{
    "_id": 10,
    "page_id": 42, #ID reference from pages collection
    "user_id": 8, #ID reference from users collection
    "review": "The review of the product",
    "ratings": [{
        "_id": 114, #ID Reference from pages collection of what rating category it is
        "rating": 5
    }, {
        "_id": 115,
        "rating":4
    }, {
        "_id": 116,
        "rating": 5
    }, {
        "_id": 117,
        "rating": 3
    }, {
        "_id": 118,
        "rating": 4
    }],
    "created": "1582825968963", #Date Object
    "votes": {
        "downvotes": [],
        "upvotes": [9] #IDs of users who upvote this review
    }
}

我想得到page_id的评论,它可以从我所做的API中访问,下面是聚合的预期结果:

代码语言:javascript
复制
[
  {
    "_id": 10, #Review of the ID
    "created": "Thu, 27 Feb 2020 17:52:48 GMT",
    "downvote_count": 0, #Length of votes.downvotes from reviews collection
    "page_id": 42, #Page ID
    "ratings": [ #Stores what rate at what rating category id
      {
        "_id": 114,
        "rating": 5
      },
      {
        "_id": 115,
        "rating": 4
      },
      {
        "_id": 116,
        "rating": 5
      },
      {
        "_id": 117,
        "rating": 3
      },
      {
        "_id": 118,
        "rating": 4
      }
    ],
    "review": "The Review",
    "upvote_count": 0, #Length of votes.upvotes from reviews collection
    "user": { #User who reviewed
      "_id": 8, #User ID
      "downvote_count": 0, #How many downvotes this user receive from all of the user's reviews
      "name": "Sample Name", #Username
      "review_count": 1, #How many reviews the user made
      "upvote_count": 1 #How many upvotes this user receive from all of the user's reviews
    },
    "vote_state": 0 #Determining vote state from the user (who requested to the API) for this review, 0 for no vote, -1 for downvote, 1 for upvote
  },
  ...
]

下面是我为上面的结果创建的评论集合聚合的管道:

代码语言:javascript
复制
user_id = 9
page_id = 42
pipeline = [
            {"$group": {
                    "_id": {"user_id":"$user_id", "page_id": "$page_id"},
                    "review_id": {"$last": "$_id"},
                    "page_id": {"$last": "$page_id"},
                    "user_id" : {"$last": "$user_id"},
                    "ratings": {"$last": "$ratings"},
                    "review": {"$last": "$review"},
                    "created": {"$last": "$created"},
                    "votes": {"$last": "$votes"},
                    "upvote_count": {"$sum": 
                        {"$cond": [ 
                            {"$ifNull": ["$votes.upvotes", False]}, 
                            {"$size": "$votes.upvotes"}, 
                            0
                        ]}
                    },
                    "downvote_count": {"$sum": 
                        {"$cond": [ 
                            {"$ifNull": ["$votes.downvotes", False]}, 
                            {"$size": "$votes.downvotes"}, 
                            0
                        ]}
                    }}},
            {"$lookup": {
                "from": "users",
                "localField": "user_id",
                "foreignField": "_id",
                "as": "user"
            }},
            {"$unwind": "$user"},
            {"$lookup": {
                "from": "reviews",
                "localField": "user._id",
                "foreignField": "user_id",
                "as": "user.reviews"
            }},
            {"$addFields":{
                "_id": "$review_id",
                "user.review_count": {"$size": "$user.reviews"},
                "user.upvote_count": {"$sum":{
                    "$map":{
                        "input":"$user.reviews",
                        "in":{"$cond": [ 
                            {"$ifNull": ["$$this.votes.upvotes", False]}, 
                            {"$size": "$$this.votes.upvotes"}, 
                            0
                        ]}
                    }
                }},
                "user.downvote_count": {"$sum":{
                    "$map":{
                        "input":"$user.reviews",
                        "in":{"$cond": [ 
                            {"$ifNull": ["$$this.votes.downvotes", False]}, 
                            {"$size": "$$this.votes.downvotes"}, 
                            0
                        ]}
                    }
                }},
                "vote_state": {"$switch": {
                    "branches": [
                        {"case": { "$and" : [
                            {"$ifNull": ["$votes.upvotes", False]}, 
                            {"$in": [user_id, "$votes.upvotes"]}
                        ]}, "then": 1
                        },
                        {"case": { "$and" : [
                            {"$ifNull": ["$votes.downvotes", False]}, 
                            {"$in": [user_id, "$votes.downvotes"]}
                        ]}, "then": -1
                        },
                    ],
                    "default": 0
                }},
            }},
            {"$project":{
                "user.password": 0,
                "user.email": 0,
                "user_id": 0,
                "review_id" : 0,
                "votes": 0,
                "user.reviews": 0 
            }},
            {"$sort": {"created": -1}},
            {"$match": {"page_id": page_id}},
        ]

注意:用户可以为同一个page_id进行多个评论,但只显示最新的

我用的是pymongo,这就是为什么运营者有引号

我的问题是:

  1. 是否有优化聚合管道的空间?
  2. 让多个小的聚合执行来获得上面的数据是一种好的做法,还是让一个大聚合(或尽可能少)来获取我想要的数据总是更好呢?
  3. 如您所见,每次我想从votes.upvotes集合的文档中访问reviewvotes.downvotes时,我都检查字段是否为null,这是因为字段votes.upvotesvotes.downvotes不是在用户进行检查时生成的,而是在用户对该审查进行表决时进行的。当用户进行检查并删除votes.upvotesvotes.downvotes时,我应该在$ifNullvotes.downvotes上创建一个空字段吗?这会提高聚合的性能吗?

谢谢

EN

回答 1

Stack Overflow用户

发布于 2020-03-01 19:37:41

检查此聚合是否具有更好的性能。

如果您还没有创建这些索引,则创建以下索引:

代码语言:javascript
复制
db.reviews.create_index([("page_id", 1)])

备注:我们可以更好地提高性能,避免$lookup 再次审查

代码语言:javascript
复制
db.reviews.aggregate([
  {
    $match: {
      page_id: page_id
    }
  },
  {
    $addFields: {
      request_user_id: user_id
    }
  },
  {
    $group: {
      _id: {
        page_id: "$page_id",
        user_id: "$user_id",            
        request_user_id: "$request_user_id"
      },
      data: {
        $push: "$$ROOT"
      }
    }
  },
  {
    $lookup: {
      "from": "users",
      "let": {
        root_user_id: "$_id.user_id"
      },
      "pipeline": [
        {
          $match: {
            $expr: {
              $eq: [
                "$$root_user_id",
                "$_id"
              ]
            }
          }
        },
        {
          $lookup: {
            "from": "reviews",
            "let": {
              root_user_id: "$$root_user_id"
            },
            "pipeline": [
              {
                $match: {
                  $expr: {
                    $eq: [
                      "$$root_user_id",
                      "$user_id"
                    ]
                  }
                }
              },
              {
                $project: {
                  user_id: 1,
                  downvote_count: {
                    $size: "$votes.downvotes"
                  },
                  upvote_count: {
                    $size: "$votes.upvotes"
                  }
                }
              },
              {
                $group: {
                  _id: null,
                  review_count: {
                    $sum: {
                      $cond: [
                        {
                          $eq: [
                            "$$root_user_id",
                            "$user_id"
                          ]
                        },
                        1,
                        0
                      ]
                    }
                  },
                  upvote_count: {
                    $sum: "$upvote_count"
                  },
                  downvote_count: {
                    $sum: "$downvote_count"
                  }
                }
              },
              {
                $unset: "_id"
              }
            ],
            "as": "stats"
          }
        },
        {
          $project: {
            tmp: {
              $mergeObjects: [
                {
                  _id: "$_id",
                  name: "$name"
                },
                {
                  $arrayElemAt: [
                    "$stats",
                    0
                  ]
                }
              ]
            }
          }
        },
        {
          $replaceWith: "$tmp"
        }
      ],
      "as": "user"
    }
  },
  {
    $addFields: {
      first: {
        $mergeObjects: [
          "$$ROOT",
          {
            $arrayElemAt: [
              "$data",
              0
            ]
          },
          {
            user: {
              $arrayElemAt: [
                "$user",
                0
              ]
            },
            created: {
              $toDate: {
                $toLong: {
                  $arrayElemAt: [
                    "$data.created",
                    0
                  ]
                }
              }
            },
            downvote_count: {
              $reduce: {
                input: "$data.votes.downvotes",
                initialValue: 0,
                in: {
                  $add: [
                    "$$value",
                    {
                      $size: "$$this"
                    }
                  ]
                }
              }
            },
            upvote_count: {
              $reduce: {
                input: "$data.votes.upvotes",
                initialValue: 0,
                in: {
                  $add: [
                    "$$value",
                    {
                      $size: "$$this"
                    }
                  ]
                }
              }
            },
            vote_state: {
              $cond: [
                {
                  $gt: [
                    {
                      $size: {
                        $filter: {
                          input: "$data.votes.upvotes",
                          cond: {
                            $in: [
                              "$_id.request_user_id",
                              "$$this"
                            ]
                          }
                        }
                      }
                    },
                    0
                  ]
                },
                1,
                {
                  $cond: [
                    {
                      $gt: [
                        {
                          $size: {
                            $filter: {
                              input: "$data.votes.downvotes",
                              cond: {
                                $in: [
                                  "$_id.request_user_id",
                                  "$$this"
                                ]
                              }
                            }
                          }
                        },
                        0
                      ]
                    },
                    -1,
                    0
                  ]
                }
              ]
            }
          }
        ]
      }
    }
  },
  {
    $unset: [
      "first.data",
      "first.votes",
      "first.user_id",
      "first.request_user_id"
    ]
  },
  {
    $replaceWith: "$first"
  },
  {
    "$sort": {
      "created": -1
    }
  }
])

MongoPlayground

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/60472616

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档