我有一个集合A和数组B,结构如下:
A
{
"_id" : ObjectId("5160757496cc6207a37ff778"),
"name" : "Pomegranate Yogurt Bowl",
"description" : "A simple breakfast bowl made with Greek yogurt, fresh pomegranate juice, puffed quinoa cereal, toasted sunflower seeds, and honey."
},
{
"_id": ObjectId("5160757596cc62079cc2db18"),
"name": "Krispy Easter Eggs",
"description": "Imagine the Easter Bunny laying an egg. Wait. That’s not anatomically possible. And anyway, the Easter Bunny is a b..."
}B:
var names = ["egg", "garlic", "cucumber", "kale", "pomegranate", "sunflower", "fish", "pork", "apple", "sunflower", "strawberry", "banana"]我的目标是从A返回一个文档,该文档在数组B中有最多出现的单词。在这种情况下,它应该返回第一个"_id" : ObjectId("5160757496cc6207a37ff778")。
我不知道该怎么解决这个问题
这不管用:
db.A.find({
"description": {
"$in": names
}
}, function(err, data) {
if (err) console.log(err);
console.log(data);
});发布于 2016-03-26 06:36:01
这取决于你想用什么样的“单词”来表达,以及它们是否被认为是“停止词”,比如"a"、"the"、"with"等等,或者这些词的数量是否真的无关紧要。
如果它们不重要,那么考虑一个$text索引和搜索。
第一指数:
db.A.createIndex({ "name": "text", "description": "text" })然后构造搜索:
var words = [
"egg", "garlic", "cucumber", "kale", "pomegranate",
"sunflower", "fish", "pork", "apple", "sunflower",
"strawberry", "banana"
];
var search = words.join(" ")
db.A.find(
{ "$text": { "$search": search } },
{ "score": { "$meta": "textScore" } }
).sort({ "score": { "$meta": "textScore" }}).limit(1)返回第一个文档,如下所示:
{
"_id" : ObjectId("5160757496cc6207a37ff778"),
"name" : "Pomegranate Yogurt Bowl",
"description" : "A simple breakfast bowl made with Greek yogurt, fresh pomegranate juice, puffed quinoa cereal, toasted sunflower seeds, and honey.",
"score" : 1.7291666666666665
}另一方面,如果您需要计数“停止单词”,那么mapReduce可以为您找到结果:
db.A.mapReduce(
function() {
var words = [
"egg", "garlic", "cucumber", "kale", "pomegranate",
"sunflower", "fish", "pork", "apple", "sunflower",
"strawberry", "banana"
];
var count = 0;
var fulltext = this.name.toLowerCase() + " " + this.description.toLowerCase();
// Increment count by number of matches
words.forEach(function(word) {
count += ( fulltext.match(new RegExp(word,"ig")) || [] ).length;
});
emit(null,{ count: count, doc: this });
},
function(key,values) {
// Sort largest first, return first
return values.sort(function(a,b) {
return a.count < b.count;
})[0];
},
{ "out": { "inline": 1 } }
)其结果是:
{
"_id" : null,
"value" : {
"count" : 4,
"doc" : {
"_id" : ObjectId("5160757496cc6207a37ff778"),
"name" : "Pomegranate Yogurt Bowl",
"description" : "A simple breakfast bowl made with Greek yogurt, fresh pomegranate juice, puffed quinoa cereal, toasted sunflower seeds, and honey."
}
}
}因此,“文本”索引方法是根据匹配的数量“加权”,然后只返回最大的加权匹配。
mapReduce操作遍历每个文档并计算出一个分数。然后“还原者”对结果进行排序,只保留得分最高的结果。
注意,可以多次调用“还原器”,因此“不”尝试一次对集合中的所有文档进行排序。但它仍然是真正的“蛮力”。
https://stackoverflow.com/questions/36232074
复制相似问题