首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >将全文与其他索引相结合

将全文与其他索引相结合
EN

Stack Overflow用户
提问于 2016-02-07 23:32:52
回答 1查看 1.8K关注 0票数 4

我有一个全文索引和一个创建日期的索引。

仅对日期的查询就会很快地返回一个漂亮的、小的44条记录(在一秒钟内):

代码语言:javascript
复制
> db.oneMillionDocumentsIndexed.count({created: {$lte: ISODate("2016-02-06T15:34:59.019Z")} })
44

但是,如果我将此与文本搜索相结合,查询将非常缓慢:

代码语言:javascript
复制
> db.oneMillionDocumentsIndexed.count({
                                created: {$lte: ISODate("2016-02-06T15:34:59.019Z")}, 
                                $text: { $search: "raven" } })

它似乎同时使用了两个索引:

代码语言:javascript
复制
{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "test.oneMillionDocumentsIndexed",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "$and" : [
                {
                    "created" : {
                        "$lte" : ISODate("2016-02-06T15:34:59.019Z")
                    }
                },
                {
                    "$text" : {
                        "$search" : "raven",
                        "$language" : ""
                    }
                }
            ]
        },
        "winningPlan" : {
            "stage" : "FETCH",
            "filter" : {
                "created" : {
                    "$lte" : ISODate("2016-02-06T15:34:59.019Z")
                }
            },
            "inputStage" : {
                "stage" : "TEXT",
                "indexPrefix" : {

                },
                "indexName" : "$**_text",
                "parsedTextQuery" : {

                }
            }
        },
        "rejectedPlans" : [ ]
    },
    "serverInfo" : {
        "host" : "Plod",
        "port" : 27017,
        "version" : "3.0.7",
        "gitVersion" : "6ce7cbe8c6b899552dadd907604559806aa2e9bd"
    },
    "ok" : 1
}    

创建的日期搜索不应该减少文档的数量,从而加快查询速度吗?

虽然这些文件并不小,但它们也不庞大。下面是一个示例文档:

代码语言:javascript
复制
{
    "_id" : ObjectId("56b612a2b6c13d2bec221d22"),
    "created" : ISODate("2016-02-06T15:34:57.954Z"),
    "adoptability-integer" : 1885631649,
    "impoverisher-double" : 0.78982932576436,
    "auriga-short-string" : "unpunished",
/*
* 提示:该行代码过长,系统自动注释不进行高亮。一键复制会移除系统注释 
* "pistillate-long-string" : "raven nationalistic supergalaxies shit candidacy vengefulness baghla inharmony breviaries subcoracoid facet numbles Achaian hyksos g¥ᄀtterdï¿¥ï¾ exsecant costliness assertively cufic neurotomy subfebrile reassess eruption calciphobous epithecium adipopectic eruption neurotomy impaste shrugging oxytone depredating abb¥ᄑ unfaithfulness clive amman meteorology dollond del cussed malversation Determinateness wadset busher precedent warder lithest tuberculinize kythera swiping hyperopic installation otosclerosis costly joyance neenah saliently bicepses myograph blackmur. salable radiational copaiva seisure animism franglais chalkboard astride preaortic machinelike criseyde easternmost theological. goloshes amber assertively universalism pterylological abortifacient entrepï¾¢t nordic intricate canvasser unscholastic caria marginal prakritic gal tambur seascouting branchiform vaticide hysteroidal. vario chefoo permanganic solidillu lashings permanganic denatured chartres Nonenergetically pabx coinheritance koulibiaca wrathless unrejoicing kodly confutable Juru changelessness ratite pol lightener pansy portadown unpeg iontophoresis Ruddily overcorrupt rondure midair mobocrat. Rals sind teaser hussism definiteness piperidine septicity procryptic salicaceous catalpa Stingy panegyrise Baddie wodan preoccasioned ndebele sanitizing mulga grantedly selectman dep overscruple mealies subsellia noncompressible lepidoptera nonequilateral vï¿¥ï¾ racemiform carob preaccredit parramatta. piatigorsky unmanifest eulogized bolometric circumnavigating stare. prewitt branchiform canadianizing untinselled crossruf anthozoic del dragrope pronative foulness incessancy sultanate debunker guncotton reindictment uninstalled pieter buying prestwick anguish dicrotism permissible. nonscarcity labialising underswamp nondegradation incubating unwillable dealer Rewinded jaggedness jasmine flatfootedness edgily choregraphic unpenetrating unwhited devotedly thornton irremediably reentry cordilleras inhospitable blenchingly hedgehop. nontribesman semiexhibitionist streetlike outgeneral Spatiality hyacinthides prometheus tingly tenacious Aerologist promonarchy nonsophistical uhuru unsprayable countrywoman proequality schickard. antagonize Cart undocumented heteroplastic cyclostome keratin specification tombless lambie extricating feticide reacceded redwing autokinetic ferias underpart dupr¥ᄑ preexperimental besancon dvm riksm'' unharmonised bradykinetic unforeseeableness ryukyu rootstalk aquarial uredospore kame nondissenting pachyderm southeasterner comminute excitant torturing reasoningly restabilize isotopy emergency boathouses plowmanship decidedness skeptophylaxis kelebe clive furred abuttals variometer indamine wreathe. guymon rubinstein monotriglyph inaction. bedazzle foreordinated proportioner pursy beryl slogging forbearer abirritant concur. nonleprous veriax overservility mirza relitigate richness dipteroi mischarged. inquisitress nav unimpressibility teratoma brilliantined untensing vlaardingen theorbo shostakovich appia maximally fingered ashkenazim soap unpick isocheimenal gingili synonymical interannular patronising knaggiest cleaver lassie interwound osculated unobliging portobello boxer impactive.Bladderwort wish aerothermodynamics lymphadenomata nonfundamental interdiffuse injector chaussure. polyphyletically irishising ayous sinecurist decant carbonized flickeringly stomatitic emily luteotrophin anginous Syllabic permeameter Carthal brachiator farinose justicelike azotized getaway electroencephalographically puglia unconfound appendiceal premedical vassal rubric overhearing Conative heartaching shammer staphylorrhaphy bulgar spilikin phagocytosing adenitis syntypic dissertate collyrium sonless anoxia archil mimosis irreversibly unhabituated scholiast rcs portadown mishima preimport bonavist jointedly aspergillus farinose condemnation chough blanc descanter mephistopheles ongoing unsurgical unclassifiableness namtar corniest disbudding disklike zap wheyface teetotally nonsubmission delian enrober canadian nasi hypermetabolism animadversion Unbantering recompile ineradicable blindly mren Schorlaceous viperous latish unstationed decastylos catalpa beflagged pellicular demark gassendi. macmonnies deserve subsidizer generous reassess colorfully unsummonable clave hderlin borges aechmagoras misbegotten uncontradictory unfelicitous plunderage presynsacral backband amagasaki unsavorily proenzyme ney slipslop unrhythmical Debenture rosy unreprehended sulfuryl outpeep fichtean jellylike anginous foil pixies columella nonsuggestion unwhited icier archbishop masan oireachtas coxcomb pseudosiphonic rubinstein cockerel fidel swingle submembranous despondent sarajevo camshaft inclusiveness reynard deducibility Counselling velveteen whaleback interventricular harquebuses sodomite chunk nondecayed disyllable nonfundamental funnelling pricing neuroanatomical evaporate palisades kamerun. zigzag meteorology agura puerperium misfield annulus sapper franklinton prenotion pyroxylin dustour fluming cereus nontangental metempirical Nonadjudication restated impactive.Bladderwort swingle frolic hadramaut buraydah uncarbonized sthenius uncreditableness undreading grattoir excitant bma mellers centurial broad intellectualist pursy apodemal inclusiveness laurence kentucky cyanic nonunified jason. swiping mismatch cereuses dress entrain mannikin insetting scratchy glaiketness query antipatriarchal rjcharging. fichtean lwm reidentification theurgically Baddie abut snowcreep vaud cretheus clubhouses homodyne rayah beguine coquettishly rabidness retime lithoid send epistyle undefendable christless narcomania extraprofessional paracelsic interrogatories eucrite cotswolds reverberantly recommendatory dorsally wobbly sheared malacca worminess oka railway farnham bendwise prediet bastioned tuberculocele deriver intelligential Cutty Artillerist calipatria torchier drillable currawong obviable remoteness. forte sentimentalist dealer nonempathically foreseeable talthybius reinjuries tannic hyperopic toolmaker pieridine noncontention panne baghla syndromic intermeasuring gait leaving osteoclasis. squillageed cadetship messieurs benet Player terseness chagrinning sterically birthmark subvertebral runesmith stomodaeum illiberalising sarmentose overlubricate weeds ecumenic unretaliative execrableness trichotomic schumann luxury nupe dirk ashkenazim zap iconoclast vulneraries pulaski hypergeusesthesia mismatch lymphangitides cubitus unpossessable rummage silviculturally bara quo. arizonian danielson granddaddy klemperer curling derivatively monadal ungrained counterproductive contendingly handled aegirine motilal. unfaulty anecdotal cyanate. bucolically leaving mephistopheles revibrated maculation glairier palmer harebell Laryngectomized primitivity.Mucous consensual comfortlessly slumberland preenrollment decastylos buying yggdrasil unslakable concordia Uprooter pï¾¢rto meloid. klemperer frambesia bohemia kruller carburettor limousin Accessarily debt ameliorate bootblack richardson salvarsan contumaciousness landscaper epigyny palisades redwing pyribenzamine totalitarian taxiplane aurum chasm criollos transannular friendship spitefully eliot Ennomus spessartine pomiferous Ethnarchy milkiness fractable amigen unexuberant.Repark clapt keyboard noninjurious unemotive corbelled sib sextet beheader appear kyathos cirrocumular semipagan teasingly coelenteron nihilism chitchatted dress bateau unrhythmical unimmerged lapelled archaeocyte depersonalise redispersal querurying memlinc strepitous. consociate dehypnotizing stardom novelise mimosa disklike invertase nonmarriageable agreeing tuberculinize graphologic paris hew airy outwove inconvenienced columella desc freight broodiest spermatorrhoea melodic rebeck. silverised jahangir everard foolishly gabby packer Mahound Emendatory infeasibility inkpot resubstantiated Isopectic revivifying crassulaceous unresigned.Greenboard hanyang guevara inspectable hyperbrachycephaly dicrotal armipotent dissever girdlelike alternator obs. heritable nondietetically sensationism medick chlorellaceous spotted flews mariner gait nontribesman unshrinkability regulated haunter sharer postliminy maeterlinck disaffiliating nonreflection disadvantageously creepy congenitalness puglia savanna. Codetta orb reenlightenment gen palaeozoology educatee niobous deject dysteleological pampre electroencephalographically harebrained execrableness achroite theorbo germinance anisocarpic jagellon antlia frenchiness splendid communalize andalusia unlofty archduchy apery forbade snit wintriest mendicity franglais depersonalise sibship unslapped totalitarian compatriot doll polkaed dyersville huntingdonshire loftily spectrality carafe gouverneur cureless unprecarious redevelop illiberalism. racialistic distributing cameo madrigalesque coalitionist snort cochleate overact ladysmith protostele Afforestation multimegaton proletarianness Amphithalamus abeokuta. Amerind subfreezing missilery secateurs. superstructure. chrysarobin seaworthiness snohomish necrobacillosis incinerated wrack sclaffer kamasutra postmyxedemic mortgagor impaste earliness underlapped bucktooth mortified birthmark unscrupled angiocardiography hemiacetal judgeless hussy channing reunified nondissenting hypercathartic vindicable unslapped extensionally lashings canniest cling motional homotherm overobesity clive retasting clipt rewound. unousted prosper australorp theocracy Interprofessional crocus Carthal unmoveable repouss¥ᄑ birthmark reasonableness wristwatch patronising g¥ᄀtterdï¿¥ï¾ jink vitus. stokes ultima phyllocladium mudfish trust caravaggio overtipple Amorphous Baddie milksopping mulier. indeciduate winkle acrimoniousness. cereuses altgeld gelatinoid contemporaneous traveling haphtaroth aet gogglebox nupe archeptolemus withdrawable Nonenergetically horsing coral. stint preludin keynoter cogitative persuadability godwin wardenry reborn patternless sorrentine vitria moror chumash nonguilt nonpacific realter regive unoratorial halothane skeptophylaxis quo. songfest Desperado mischarged. suberise teratoma apposer homoiothermal nonstyptical "
*/
}
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-02-08 01:08:20

这里的主要情况是,“文本”搜索结果通常优先于查询中的其他筛选条件,因此有必要“首先”从"text“组件获得结果,然后基本”扫描“文档中的其他条件。

这种类型的搜索很难与“范围”或任何类型的“不平等”匹配条件与文本搜索结果结合在一起进行优化,这主要是由于MongoDB如何处理这种“特殊”索引类型。

为了进行简短的演示,请考虑以下基本设置:

代码语言:javascript
复制
db.texty.drop();

db.texty.insert([
    { "a": "a", "text": "something" },
    { "a": "b", "text": "something" },
    { "a": "b", "text": "nothing much" },
    { "a": "c", "text": "something" }
])

db.texty.createIndex({ "text": "text" })
db.texty.createIndex({ "a": 1 })

因此,如果您想使用文本搜索条件以及另一个字段( { "$lt": "c" } )上的范围考虑来查看这个问题,那么您可以处理如下:

代码语言:javascript
复制
db.texty.find({ "a": { "$lt": "c" }, "$text": { "$search": "something" } }).explain()

具有解释输出,如(重要部分):

代码语言:javascript
复制
           "winningPlan" : {
                    "stage" : "FETCH",
                    "filter" : {
                            "a" : {
                                    "$lt" : "c"
                            }
                    },
                    "inputStage" : {
                            "stage" : "TEXT",
                            "indexPrefix" : {

                            },
                            "indexName" : "text_text",
                            "parsedTextQuery" : {
                                    "terms" : [
                                            "someth"
                                    ],
                                    "negatedTerms" : [ ],
                                    "phrases" : [ ],
                                    "negatedPhrases" : [ ]
                            },
                            "inputStage" : {
                                    "stage" : "TEXT_MATCH",
                                    "inputStage" : {
                                            "stage" : "TEXT_OR",
                                            "inputStage" : {
                                                    "stage" : "IXSCAN",
                                                    "keyPattern" : {
                                                            "_fts" : "text",
                                                            "_ftsx" : 1
                                                    },
                                                    "indexName" : "text_text",
                                                    "isMultiKey" : true,
                                                    "isUnique" : false,
                                                    "isSparse" : false,
                                                    "isPartial" : false,
                                                    "indexVersion" : 1,
                                                    "direction" : "backward",
                                                    "indexBounds" : {

                                                    }
                                            }
                                    }
                            }
                    }
            },

这基本上是说"__first给我文本结果,然后过滤其他条件获取的结果“。因此很明显,这里只使用"text“索引,然后它返回的所有结果都会通过检查内容进行过滤。

这不是最优的,原因有两个,可能是数据可能受到“范围”条件的最佳约束,而不是文本搜索中的匹配。第二,尽管在其他数据上有一个索引,但这里并没有使用它来进行比较。因此,而是为每个结果加载整个文档,并测试过滤器。

然后,您可能会在这里考虑一种“复合”索引格式,如果"range“更特定于所选内容,那么将其作为索引键的前缀顺序,这似乎一开始是合乎逻辑的:

代码语言:javascript
复制
db.texty.dropIndexes();
db.texty.createIndex({ "a": 1, "text": "text" })

但是这里有一个问题,因为当您再次尝试运行查询时:

代码语言:javascript
复制
db.texty.find({ "a": { "$lt": "c" }, "$text": { "$search": "something" } })

这将导致一个错误:

错误:{ "waitedMS“:NumberLong(0),"ok”:0,"errmsg“:”错误处理查询: ns=test.textyTree:$and\n a $lt \“c\n文本: query=something,language=english,caseSensitive=0,diacriticSensitive=0,tag=NULL\nSort:{}\nProj:{}\nProj:{}\n规划者返回的错误:未能使用文本索引满足$text查询(如果文本索引是复合的,是否为所有前缀字段提供了相等谓词?)“代码”:2}

因此,即使这看起来是“最优的”,MongoDB处理特殊“文本”索引的查询(以及真正的索引选择)的方式也是不可能的,但在范围之外的“排除”是不可能的。

但是,您可以以非常高效的方式对此执行“平等”匹配:

代码语言:javascript
复制
db.texty.find({ "a": "b", "$text": { "$search": "something" } }).explain()

具有解释输出:

代码语言:javascript
复制
           "winningPlan" : {
                    "stage" : "TEXT",
                    "indexPrefix" : {
                            "a" : "b"
                    },
                    "indexName" : "a_1_text_text",
                    "parsedTextQuery" : {
                            "terms" : [
                                    "someth"
                            ],
                            "negatedTerms" : [ ],
                            "phrases" : [ ],
                            "negatedPhrases" : [ ]
                    },
                    "inputStage" : {
                            "stage" : "TEXT_MATCH",
                            "inputStage" : {
                                    "stage" : "TEXT_OR",
                                    "inputStage" : {
                                            "stage" : "IXSCAN",
                                            "keyPattern" : {
                                                    "a" : 1,
                                                    "_fts" : "text",
                                                    "_ftsx" : 1
                                            },
                                            "indexName" : "a_1_text_text",
                                            "isMultiKey" : true,
                                            "isUnique" : false,
                                            "isSparse" : false,
                                            "isPartial" : false,
                                            "indexVersion" : 1,
                                            "direction" : "backward",
                                            "indexBounds" : {

                                            }
                                    }
                            }
                    }
            },

因此,使用索引,它可以显示为“预过滤”的内容提供给文本匹配的输出的其他条件。

但是,如果您确实将索引中的“前缀”保留为"text“字段以进行搜索:

代码语言:javascript
复制
db.texty.dropIndexes();

db.texty.createIndex({ "text": "text", "a": 1 })

然后执行搜索:

代码语言:javascript
复制
db.texty.find({ "a": { "$lt": "c" }, "$text": { "$search": "something" } }).explain()

然后,您将看到与上面的“相等”匹配类似的结果:

代码语言:javascript
复制
            "winningPlan" : {
                    "stage" : "TEXT",
                    "indexPrefix" : {

                    },
                    "indexName" : "text_text_a_1",
                    "parsedTextQuery" : {
                            "terms" : [
                                    "someth"
                            ],
                            "negatedTerms" : [ ],
                            "phrases" : [ ],
                            "negatedPhrases" : [ ]
                    },
                    "inputStage" : {
                            "stage" : "TEXT_MATCH",
                            "inputStage" : {
                                    "stage" : "TEXT_OR",
                                    "filter" : {
                                            "a" : {
                                                    "$lt" : "c"
                                            }
                                    },
                                    "inputStage" : {
                                            "stage" : "IXSCAN",
                                            "keyPattern" : {
                                                    "_fts" : "text",
                                                    "_ftsx" : 1,
                                                    "a" : 1
                                            },
                                            "indexName" : "text_text_a_1",
                                            "isMultiKey" : true,
                                            "isUnique" : false,
                                            "isSparse" : false,
                                            "isPartial" : false,
                                            "indexVersion" : 1,
                                            "direction" : "backward",
                                            "indexBounds" : {

                                            }
                                    }
                            }
                    }
            },

这里与第一次尝试的最大不同是filter被放置在处理链中,这表明虽然不是“前缀”匹配(这是最理想的),但在“发送到”“文本”阶段之前,内容确实是从索引中扫描出来的。

因此,它是“预过滤”,但当然不是以最优的方式,这是由于“文本”索引的使用方式的本质。因此,如果仅仅考虑索引本身的普通范围:

代码语言:javascript
复制
db.texty.createIndex({ "a": 1 })
db.texty.find({ "a": { "$lt": "c" } }).explain()

然后解释输出:

代码语言:javascript
复制
            "winningPlan" : {
                    "stage" : "FETCH",
                    "inputStage" : {
                            "stage" : "IXSCAN",
                            "keyPattern" : {
                                    "a" : 1
                            },
                            "indexName" : "a_1",
                            "isMultiKey" : false,
                            "isUnique" : false,
                            "isSparse" : false,
                            "isPartial" : false,
                            "indexVersion" : 1,
                            "direction" : "forward",
                            "indexBounds" : {
                                    "a" : [
                                            "[\"\", \"c\")"
                                    ]
                            }
                    }
            },

然后,这至少让indexBounds考虑,并且只看了指数中在这些范围内的那部分。

这就是区别所在。通过缩小选择范围,使用“复合”结构可以节省一些迭代周期,但它仍然必须扫描所有索引条目以进行筛选,当然,除非可以在索引上使用相等匹配,否则而不是必须是索引中的“前缀”元素。

如果索引中没有复合结构,则总是首先返回文本结果,然后对这些结果应用任何其他条件。此外,由于查询引擎的处理,无法“组合/交叉”查看“文本”索引和“正常”索引的结果。这通常不是最优的方法,所以规划考虑因素是很重要的。

简而言之,理想的复合词与“相等”匹配“前缀”,如果没有,则包括在“文本定义”之后的索引中。

票数 27
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/35260539

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档