我有下面的查询,它匹配使用公共字段的两个集合,但是有一个问题,比较并不不敏感,因此我没有得到完全匹配。然后我试着让它变得迟钝。
下面是我使用的查询:
cursor= db.csv_import.aggregate([
{
$lookup: {
from: 'EN',
let: {pn:'$part_no',vendor:'$vendor_standard'},
pipeline: [{
$match: {
$expr: {
$and: [{$eq: ["$$pn","$ICECAT-interface.Product.@Prod_id"]}],{$eq: ["$$vendor","$ICECAT-interface.Product.Supplier.@Name"]}]
}
}
}],
as: 'part_number_info'
}
}, { $match: {"part_number_info.0": {$exists: true}}
}, { $project: {"part_no": 1,"part_number_info.ICECAT-interface.Product.@ID": 1, "part_number_info.ICECAT-interface.Product.Supplier.@Name": 1, "_id":0}}
]).pretty();我在这里读过关于迟钝的文章:https://www.mongodb.com/docs/manual/core/index-case-insensitive/
因此,我在两个集合上为供应商和产品部件编号创建了2个索引,如下所示(例如,部件编号)
db.csv_import.createIndex(
{'part_no': 1},{name: "part_no_unsensitive_idx", collation: {locale: "en",strength:2})
db.EN.createIndex(
{'ICECAT-interface.Product.@Prod_id': 1},{name: "product_unsensitive_idx", collation: {locale: "en",strength:2})我试过这个:
cursor= db.csv_import.aggregate([
{
$lookup: {
from: 'EN',
let: {pn:'$part_no',vendor:'$vendor_standard'},
pipeline: [{
$match: {
$expr: {
$and: [{$eq: ["$$pn","$ICECAT-interface.Product.@Prod_id"]},{$eq: ["$$vendor","$ICECAT-interface.Product.Supplier.@Name"]}]
}
}
}],
as: 'part_number_info'
}
}, { $match: {"part_number_info.0": {$exists: true}}
}, { $project: {"part_no": 1,"part_number_info.ICECAT-interface.Product.@ID": 1, "part_number_info.ICECAT-interface.Product.Supplier.@Name": 1, "_id":0}}
]).collation( { locale: 'en', strength: 2 }).pretty();但是,执行时间从毫秒到长分钟(超过10分钟)。显然,它没有使用索引,是我创建索引的方式有问题,还是我在这里遗漏了其他什么东西?
编辑
我按照建议将其更改为这个(管道后的排序),虽然它改进了(花费了一半时间),但仍然需要一个小时,而没有排序则需要几秒钟:
use Icecat
db.csv_import.aggregate([
{
$lookup: {
from: 'EN',
let: {pn:'$part_no',vendor:'$vendor_standard'},
pipeline: [{
$match: {
$expr: {
$and: [{$eq: ["$$pn","$ICECAT-interface.Product.@Prod_id"]},{$eq: ["$$vendor","$ICECAT-interface.Product.Supplier.@Name"]}]
}
}
}],
as: 'part_number_info'
}
}, { $match: {"part_number_info.0": {$exists: true}}
}, { $project: {"part_no": 1,"part_number_info.ICECAT-interface.Product.@ID": 1, "part_number_info.ICECAT-interface.Product.Supplier.@Name": 1, "_id":0}
}
],
{collation: { locale: 'en', strength: 2 }
).pretty();编辑2
解释的产出:
{
"explainVersion": "1",
"stages": [
{
"$cursor": {
"queryPlanner": {
"namespace": "Icecat.csv_import",
"indexFilterSet": false,
"parsedQuery": {},
"queryHash": "005DB16E",
"planCacheKey": "E1018696",
"maxIndexedOrSolutionsReached": false,
"maxIndexedAndSolutionsReached": false,
"maxScansToExplodeReached": false,
"winningPlan": {
"stage": "PROJECTION_SIMPLE",
"transformBy": {
"part_no": 1,
"part_number_info": 1,
"vendor_standard": 1,
"_id": 0
},
"inputStage": {
"stage": "COLLSCAN",
"direction": "forward"
}
},
"rejectedPlans": []
},
"executionStats": {
"executionSuccess": true,
"nReturned": 23685,
"executionTimeMillis": 9051,
"totalKeysExamined": 0,
"totalDocsExamined": 23685,
"executionStages": {
"stage": "PROJECTION_SIMPLE",
"nReturned": 23685,
"executionTimeMillisEstimate": 18,
"works": 23687,
"advanced": 23685,
"needTime": 1,
"needYield": 0,
"saveState": 25,
"restoreState": 25,
"isEOF": 1,
"transformBy": {
"part_no": 1,
"part_number_info": 1,
"vendor_standard": 1,
"_id": 0
},
"inputStage": {
"stage": "COLLSCAN",
"nReturned": 23685,
"executionTimeMillisEstimate": 8,
"works": 23687,
"advanced": 23685,
"needTime": 1,
"needYield": 0,
"saveState": 25,
"restoreState": 25,
"isEOF": 1,
"direction": "forward",
"docsExamined": 23685
}
}
}
},
"nReturned": 23685,
"executionTimeMillisEstimate": 65
},
{
"$lookup": {
"from": "EN",
"as": "part_number_info",
"let": {
"pn": "$part_no",
"vendor": "$vendor_standard"
},
"pipeline": [
{
"$match": {
"$expr": {
"$and": [
{
"$eq": [
"$$pn",
"$ICECAT-interface.Product.@Prod_id"
]
},
{
"$eq": [
"$$vendor",
"$ICECAT-interface.Product.Supplier.@Name"
]
}
]
}
}
}
]
},
"totalDocsExamined": 2769,
"totalKeysExamined": 2769,
"collectionScans": 0,
"indexesUsed": [
"ICECAT-interface.Product.@Prod_id_1"
],
"nReturned": 23685,
"executionTimeMillisEstimate": 8918
},
{
"$match": {
"part_number_info.0": {
"$exists": true
}
},
"nReturned": 2690,
"executionTimeMillisEstimate": 8919
},
{
"$project": {
"part_no": true,
"part_number_info": {
"ICECAT-interface": {
"Product": {
"@ID": true,
"Supplier": {
"@Name": true
}
}
}
},
"_id": false
},
"nReturned": 2690,
"executionTimeMillisEstimate": 8919
}
],
"serverInfo": {
"host": "ip-10-0-1-199.eu-west-1.compute.internal",
"port": 27017,
"version": "5.0.9-8",
"gitVersion": "15a95b4ea8203b337be88bebbeea864b4dadb6a2"
},
"serverParameters": {
"internalQueryFacetBufferSizeBytes": 104857600,
"internalQueryFacetMaxOutputDocSizeBytes": 104857600,
"internalLookupStageIntermediateDocumentMaxSizeBytes": 104857600,
"internalDocumentSourceGroupMaxMemoryBytes": 104857600,
"internalQueryMaxBlockingSortMemoryUsageBytes": 104857600,
"internalQueryProhibitBlockingMergeOnMongoS": 0,
"internalQueryMaxAddToSetBytes": 104857600,
"internalDocumentSourceSetWindowFieldsMaxMemoryBytes": 104857600
},
"command": {
"aggregate": "csv_import",
"pipeline": [
{
"$lookup": {
"from": "EN",
"let": {
"pn": "$part_no",
"vendor": "$vendor_standard"
},
"pipeline": [
{
"$match": {
"$expr": {
"$and": [
{
"$eq": [
"$$pn",
"$ICECAT-interface.Product.@Prod_id"
]
},
{
"$eq": [
"$$vendor",
"$ICECAT-interface.Product.Supplier.@Name"
]
}
]
}
}
}
],
"as": "part_number_info"
}
},
{
"$match": {
"part_number_info.0": {
"$exists": true
}
}
},
{
"$project": {
"part_no": 1,
"part_number_info.ICECAT-interface.Product.@ID": 1,
"part_number_info.ICECAT-interface.Product.Supplier.@Name": 1,
"_id": 0
}
}
],
"cursor": {},
"$db": "Icecat"
},
"ok": 1
}发布于 2022-11-10 21:28:25
我想这里可能有两个问题。
首先,不清楚是否以适当的方式请求排序。虽然它可能依赖于版本,但文档建议语法是:
db.myColl.aggregate(
<PIPELINE>,
{ collation: { locale: "en", strength: 2 } }
);第二件事是,有点类似于您的一个以前的问题,我认为字段名在这里是错误的。在$lookup pipeline for EN集合中,您拥有:
$and: [{$eq: ["$$pn","$ICECAT-interface.Product.@Prod_id"]}] ... 但是,您在EN集合上创建的索引是:
db.EN.createIndex( {'ICECAT-interface.Product.@ID': 1}, ...注意字段名的末尾是@Prod_id和@ID。其中一项需要更新。
基于解释计划的编辑(谢谢!)还有其他问题。
“解释计划”似乎表明,事情基本上是按预期进行的。更具体地说,我们看到:
csv_import集合使用集合扫描。请注意,这是预期的,因为聚合没有对该集合进行任何筛选,因此没有任何索引可以帮助。EN集合使用索引。$lookup数组中的stages条目报告"indexesUsed": [ "ICECAT-interface.Product.@Prod_id_1" ]以及"totalDocsExamined": 2769和"totalKeysExamined": 2769。最后,为了更好地衡量,它还报告了"collectionScans": 0。"executionTimeMillis": 9051。那么,为什么我说这是“主要起作用的”呢?这里报告的9秒和你的评论之间的区别是“过去花秒的时间大约需要一个小时(当添加排序规则时)”。
在explain输出中,我没有看到的是提到了collation。当我生成一个解释计划(使用前面提到的将collation指定为.aggregate()函数调用中的选项的语法)时,响应的command部分如下所示:
command: {
aggregate: 'csv_import',
pipeline: [ ... ],
cursor: {},
collation: { locale: 'en', strength: 2 },
'$db': 'test'
}具体来说,collation是直接说明的。这在您的explain输出中是没有的。因此,我认为:
explain计划是用于不使用collation的聚合。这也意味着您的"ICECAT-interface.Product.@Prod_id_1"索引做了而不是,并对其应用了排序规则。您应该能够通过检查该集合的.getIndexes()输出来确认这一点。collation并尝试执行该操作时,数据库将无法使用索引(因为前面提到的不匹配),这肯定会解释长期存在的问题。总之,在EN集合上的索引和正在运行的聚合上定义的排序规则仍然不匹配。请再检查一遍并改正这些项目。
https://stackoverflow.com/questions/74394256
复制相似问题