文章/答案/技术大牛

发布

社区首页 >问答首页 >MongoDB不区分大小写的索引无法按预期工作

问MongoDB不区分大小写的索引无法按预期工作
EN

Stack Overflow用户

提问于 2022-11-10 19:17:00

回答 1查看 36关注 0票数 0

我有下面的查询，它匹配使用公共字段的两个集合，但是有一个问题，比较并不不敏感，因此我没有得到完全匹配。然后我试着让它变得迟钝。

下面是我使用的查询：

cursor= db.csv_import.aggregate([
 {
     $lookup: {
  from: 'EN',
  let: {pn:'$part_no',vendor:'$vendor_standard'},
  pipeline: [{
      $match: {
          $expr: {
              $and: [{$eq: ["$$pn","$ICECAT-interface.Product.@Prod_id"]}],{$eq: ["$$vendor","$ICECAT-interface.Product.Supplier.@Name"]}]
          }
      }
  }],
  as: 'part_number_info'
     }
 }, { $match: {"part_number_info.0": {$exists: true}}
 }, { $project: {"part_no": 1,"part_number_info.ICECAT-interface.Product.@ID": 1, "part_number_info.ICECAT-interface.Product.Supplier.@Name": 1, "_id":0}}
 ]).pretty();

我在这里读过关于迟钝的文章：https://www.mongodb.com/docs/manual/core/index-case-insensitive/

因此，我在两个集合上为供应商和产品部件编号创建了2个索引，如下所示(例如，部件编号)

db.csv_import.createIndex(
    {'part_no': 1},{name: "part_no_unsensitive_idx", collation: {locale: "en",strength:2})

  db.EN.createIndex(
    {'ICECAT-interface.Product.@Prod_id': 1},{name: "product_unsensitive_idx", collation: {locale: "en",strength:2})

我试过这个：

cursor= db.csv_import.aggregate([
 {
     $lookup: {
  from: 'EN',
  let: {pn:'$part_no',vendor:'$vendor_standard'},
  pipeline: [{
      $match: {
          $expr: {
              $and: [{$eq: ["$$pn","$ICECAT-interface.Product.@Prod_id"]},{$eq: ["$$vendor","$ICECAT-interface.Product.Supplier.@Name"]}]
          }
      }
  }],
  as: 'part_number_info'
     }
 }, { $match: {"part_number_info.0": {$exists: true}}
 }, { $project: {"part_no": 1,"part_number_info.ICECAT-interface.Product.@ID": 1, "part_number_info.ICECAT-interface.Product.Supplier.@Name": 1, "_id":0}}
 ]).collation( { locale: 'en', strength: 2 }).pretty();

但是，执行时间从毫秒到长分钟(超过10分钟)。显然，它没有使用索引，是我创建索引的方式有问题，还是我在这里遗漏了其他什么东西？

编辑

我按照建议将其更改为这个(管道后的排序)，虽然它改进了(花费了一半时间)，但仍然需要一个小时，而没有排序则需要几秒钟：

use Icecat
db.csv_import.aggregate([
 {
     $lookup: {
  from: 'EN',
  let: {pn:'$part_no',vendor:'$vendor_standard'},
  pipeline: [{
      $match: {
          $expr: {
              $and: [{$eq: ["$$pn","$ICECAT-interface.Product.@Prod_id"]},{$eq: ["$$vendor","$ICECAT-interface.Product.Supplier.@Name"]}]
          }
      }
  }],
  as: 'part_number_info'
     }
 }, { $match: {"part_number_info.0": {$exists: true}}
 }, { $project: {"part_no": 1,"part_number_info.ICECAT-interface.Product.@ID": 1, "part_number_info.ICECAT-interface.Product.Supplier.@Name": 1, "_id":0}
 }
 ],
 {collation: { locale: 'en', strength: 2 }
 ).pretty();

编辑2

解释的产出：

{
  "explainVersion": "1",
  "stages": [
    {
      "$cursor": {
        "queryPlanner": {
          "namespace": "Icecat.csv_import",
          "indexFilterSet": false,
          "parsedQuery": {},
          "queryHash": "005DB16E",
          "planCacheKey": "E1018696",
          "maxIndexedOrSolutionsReached": false,
          "maxIndexedAndSolutionsReached": false,
          "maxScansToExplodeReached": false,
          "winningPlan": {
            "stage": "PROJECTION_SIMPLE",
            "transformBy": {
              "part_no": 1,
              "part_number_info": 1,
              "vendor_standard": 1,
              "_id": 0
            },
            "inputStage": {
              "stage": "COLLSCAN",
              "direction": "forward"
            }
          },
          "rejectedPlans": []
        },
        "executionStats": {
          "executionSuccess": true,
          "nReturned": 23685,
          "executionTimeMillis": 9051,
          "totalKeysExamined": 0,
          "totalDocsExamined": 23685,
          "executionStages": {
            "stage": "PROJECTION_SIMPLE",
            "nReturned": 23685,
            "executionTimeMillisEstimate": 18,
            "works": 23687,
            "advanced": 23685,
            "needTime": 1,
            "needYield": 0,
            "saveState": 25,
            "restoreState": 25,
            "isEOF": 1,
            "transformBy": {
              "part_no": 1,
              "part_number_info": 1,
              "vendor_standard": 1,
              "_id": 0
            },
            "inputStage": {
              "stage": "COLLSCAN",
              "nReturned": 23685,
              "executionTimeMillisEstimate": 8,
              "works": 23687,
              "advanced": 23685,
              "needTime": 1,
              "needYield": 0,
              "saveState": 25,
              "restoreState": 25,
              "isEOF": 1,
              "direction": "forward",
              "docsExamined": 23685
            }
          }
        }
      },
      "nReturned": 23685,
      "executionTimeMillisEstimate": 65
    },
    {
      "$lookup": {
        "from": "EN",
        "as": "part_number_info",
        "let": {
          "pn": "$part_no",
          "vendor": "$vendor_standard"
        },
        "pipeline": [
          {
            "$match": {
              "$expr": {
                "$and": [
                  {
                    "$eq": [
                      "$$pn",
                      "$ICECAT-interface.Product.@Prod_id"
                    ]
                  },
                  {
                    "$eq": [
                      "$$vendor",
                      "$ICECAT-interface.Product.Supplier.@Name"
                    ]
                  }
                ]
              }
            }
          }
        ]
      },
      "totalDocsExamined": 2769,
      "totalKeysExamined": 2769,
      "collectionScans": 0,
      "indexesUsed": [
        "ICECAT-interface.Product.@Prod_id_1"
      ],
      "nReturned": 23685,
      "executionTimeMillisEstimate": 8918
    },
    {
      "$match": {
        "part_number_info.0": {
          "$exists": true
        }
      },
      "nReturned": 2690,
      "executionTimeMillisEstimate": 8919
    },
    {
      "$project": {
        "part_no": true,
        "part_number_info": {
          "ICECAT-interface": {
            "Product": {
              "@ID": true,
              "Supplier": {
                "@Name": true
              }
            }
          }
        },
        "_id": false
      },
      "nReturned": 2690,
      "executionTimeMillisEstimate": 8919
    }
  ],
  "serverInfo": {
    "host": "ip-10-0-1-199.eu-west-1.compute.internal",
    "port": 27017,
    "version": "5.0.9-8",
    "gitVersion": "15a95b4ea8203b337be88bebbeea864b4dadb6a2"
  },
  "serverParameters": {
    "internalQueryFacetBufferSizeBytes": 104857600,
    "internalQueryFacetMaxOutputDocSizeBytes": 104857600,
    "internalLookupStageIntermediateDocumentMaxSizeBytes": 104857600,
    "internalDocumentSourceGroupMaxMemoryBytes": 104857600,
    "internalQueryMaxBlockingSortMemoryUsageBytes": 104857600,
    "internalQueryProhibitBlockingMergeOnMongoS": 0,
    "internalQueryMaxAddToSetBytes": 104857600,
    "internalDocumentSourceSetWindowFieldsMaxMemoryBytes": 104857600
  },
  "command": {
    "aggregate": "csv_import",
    "pipeline": [
      {
        "$lookup": {
          "from": "EN",
          "let": {
            "pn": "$part_no",
            "vendor": "$vendor_standard"
          },
          "pipeline": [
            {
              "$match": {
                "$expr": {
                  "$and": [
                    {
                      "$eq": [
                        "$$pn",
                        "$ICECAT-interface.Product.@Prod_id"
                      ]
                    },
                    {
                      "$eq": [
                        "$$vendor",
                        "$ICECAT-interface.Product.Supplier.@Name"
                      ]
                    }
                  ]
                }
              }
            }
          ],
          "as": "part_number_info"
        }
      },
      {
        "$match": {
          "part_number_info.0": {
            "$exists": true
          }
        }
      },
      {
        "$project": {
          "part_no": 1,
          "part_number_info.ICECAT-interface.Product.@ID": 1,
          "part_number_info.ICECAT-interface.Product.Supplier.@Name": 1,
          "_id": 0
        }
      }
    ],
    "cursor": {},
    "$db": "Icecat"
  },
  "ok": 1
}

case-sensitive

mongodb

indexing

case

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-11-10 21:28:25

我想这里可能有两个问题。

首先，不清楚是否以适当的方式请求排序。虽然它可能依赖于版本，但文档建议语法是：

db.myColl.aggregate(
   <PIPELINE>,
   { collation: { locale: "en", strength: 2 } }
);

第二件事是，有点类似于您的一个以前的问题，我认为字段名在这里是错误的。在$lookup pipeline for EN集合中，您拥有：

   $and: [{$eq: ["$$pn","$ICECAT-interface.Product.@Prod_id"]}] ...

但是，您在EN集合上创建的索引是：

db.EN.createIndex( {'ICECAT-interface.Product.@ID': 1}, ...

注意字段名的末尾是@Prod_id和@ID。其中一项需要更新。

基于解释计划的编辑(谢谢！)还有其他问题。

“解释计划”似乎表明，事情基本上是按预期进行的。更具体地说，我们看到：

查询对源csv_import集合使用集合扫描。请注意，这是预期的，因为聚合没有对该集合进行任何筛选，因此没有任何索引可以帮助。
数据库正在对EN集合使用索引。$lookup数组中的stages条目报告"indexesUsed": [ "ICECAT-interface.Product.@Prod_id_1" ]以及"totalDocsExamined": 2769和"totalKeysExamined": 2769。最后，为了更好地衡量，它还报告了"collectionScans": 0。
执行此操作所需的总时间约为9秒："executionTimeMillis": 9051。

那么，为什么我说这是“主要起作用的”呢?这里报告的9秒和你的评论之间的区别是“过去花秒的时间大约需要一个小时(当添加排序规则时)”。

在explain输出中，我没有看到的是提到了collation。当我生成一个解释计划(使用前面提到的将collation指定为.aggregate()函数调用中的选项的语法)时，响应的command部分如下所示：

  command: {
    aggregate: 'csv_import',
    pipeline: [ ... ],
    cursor: {},
    collation: { locale: 'en', strength: 2 },
    '$db': 'test'
  }

具体来说，collation是直接说明的。这在您的explain输出中是没有的。因此，我认为：

我们正在研究的explain计划是用于不使用collation的聚合。这也意味着您的"ICECAT-interface.Product.@Prod_id_1"索引做了而不是，并对其应用了排序规则。您应该能够通过检查该集合的.getIndexes()输出来确认这一点。
当您应用collation并尝试执行该操作时，数据库将无法使用索引(因为前面提到的不匹配)，这肯定会解释长期存在的问题。

总之，在EN集合上的索引和正在运行的聚合上定义的排序规则仍然不匹配。请再检查一遍并改正这些项目。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/74394256

复制

相似问题

问MongoDB不区分大小写的索引无法按预期工作
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问MongoDB不区分大小写的索引无法按预期工作EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问MongoDB不区分大小写的索引无法按预期工作
EN