我想使用solr查询搜索单词,直到特定的行,而不是超出这一行。我试过近距离比赛,但没有起作用。我的数据就像
“日期:清华,2014年7月24日09:36:44 GMT\n缓存-控制:私有\n内容-类型:应用程序/json;字符集=utf-8\n内容-编码:gzip\n nVary:接受-编码\n P3P: CP=%20CURo TAIo IVAo IVDo ONL UNI NAV INT STA我们%20\nX-供电-由ASP.NET\n内容-长度: 570 \nKeep-活动: timeout=120\nConnection:保持-活动\n\n\n[{%20行%20:[],%20索引%20:[],%20文件夹%20:[%20收件箱%20,%20收件箱%20,%20收件箱%20,%20,1,1,0,0,0,%20,20收件箱%20,0,0,0,0,20%20,20%20,20%20,1,1,0,0,0,0,20%20%20,0%,20%20%,20%20%,20%20%,20%20%,20%20%,1,1,0,0,11,%20 20Sent%20,1,0,0,20 20%20 20Sent%20,%20垃圾邮件%20,20%20,1,1,0,0,0,0%20 20Spam%20,1,0,0,0,20%20 20Sent%20,20 20Trash%20,%20%,20%1,1,0,7,9,%20删除%20,1,0,%20%20,%20 20Saved%20,%20%20 20Saved%20,%20%1,1,0,0,%20 20Saved%20,1,0,0,%20 20Saved%20,%20 20SavedIMs%20,%20 20Saved%20,%20 20Saved%20,2,1,0,0,%20,20%20%20],%20%20%支持%20:真,%20%20%20%20:0,%20 %20Saved%20 :true,%20 FoldersCanveve20:20:20%20Sent 20%20垃圾邮件%20,%20删除%20,%20保存%20,%20保存开始%20:0}后/38664-816/aol-6/en-us/common/rpc/RPC.aspx?user=hl1lkgReIh&transport=xmlhttp&r=0.019667088333411797&a=GetMessageList&l=31211 HTTP/1.1\n主机:mail.aol.com\nUser代理: Mozilla/5.0 (Windows 5.1;rv:31.0) Gecko/20100101火狐/31.0\n接受: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8\n nAccept语言: en-US,en;q=0.5\n nAccept编码: gzip,UTF\n nContent Type:application/xhtml form-urlencoded;charset=utf-8\nX-nReferer:http://mail.aol.com/38664-816/aol-6/en-us/Suite.aspx\n nContent长度:http://mail.aol.com/38664-816/aol-6/en-us/Suite.aspx\n nCookie:http://mail.aol.com/38664-816/aol-6/en-us/Suite.aspx\n nCookie:s_pers=%20s_fid%3D55C638B5F089E6FB-19ACDEED1644FD86%7C1469344726539%3B%20s_getnr%3D1406186326569-Repeat%7C1469258326569%3B%20s_nrgvo%3DRepeat%7C1469258326571%3B;s_vi=CSv1|29E33A0D051D366F-60000105200097FFCE;UNAUTHID=1.5efb4a11934a40b8b5272557263dadfe.88c5;RSP_COOKIE=type=30&name=YWxzaGFraWIyMDE0&sn=MzRb%2FjjHIe8odpr%2FfxZR2g%3D%3D&stype=0&agrp=M;LTState=ver:5&lav:22&un:*UQo5AwAnAytffwJSYg%3d%3d&sn:*UQo5AwAnAytffwJSYg%3d%3d&uv:AOL&lc:en-us&ud:aol.com&ea:*UQo5AwAnAytffwJSCAsnWWoJASZL&prmc:825345&mt:6&ams:1&cmai:365&snt:0&vnop:False&mh:core-mia002b.r1000.mail.aol.com&br:100&wm:mail.aol.com&ckd:.mail.aol.com&ckp:%2f&ha:1NGRuUTRRxGFF2s5A4JwkuCT43Q%3d&;aolweatherlocation=10003;DataLayer=cons%3D6.107%26coms%3D629;grvinsights=69f3a2bb86ed3cd31aa1d14a1ce9e845;CUNAUTHID=1.5efb4a11934a40b8b5272557263dadfe.88c5;s_sess=%20s_cc%3Dtrue%3B%20s_sq%3Daolcmp%253D%252526pid%25253Dcmp%2525253A%25252520Help%25252520%2525257C%25252520View%25252520Article%2525253A%25252520Clear%25252520cookies%2525252C%25252520cache%2525252C%25252520history%25252520and%25252520footprints%252526pidt%25253D1%252526oid%25253Dhttp%2525253A%2525252F%2525252Fwebmail.aol.com%2525252F%2525253F_AOLLOCAL%2525253Dmail%252526ot%25253DA%2526aolsnssignin%253D%252526pid%25253Dsso%25252520%2525253A%25252520login%252526pidt%25253D1%252526oid%25253DSign%25252520In%252526oidt%25253D3%252526ot%25253DSUBMIT%3B;L7Id=31211;Context=ver:3&sid:923f783b-bc6e-4edf-87c9-e52f19b3ce67&rt:STANDARD&i:f&ckd:.mail.aol.com&ckp:%2f&ha:X80Ku4ffRKsOVSwgmEVPCfpfxeU%3d&;IDP_A=s-1-V0c3QiuO6BzQ5S6_u3s0brfUqMCktezAz7sWlVfHD90omIijDXRrMJkSM-9-xcnUcSTnXbcZ1aUCgvfuToVeJihcftKY5KtsC_nB7Y9qf6P0xUnNfCIAmWVtRf4ctSQ9JwRIzHa40dhFuULwYLu3NUPTxckeFUFAzcSS4hrmb4grhEtyOGp0qV5rIKtjs4u8;MC_CMP_ESK=NonSense;SNS_AA=asrc=2&sst=1406185424&type=0;_utd=gd#MzRb%2FjjHIe8odpr%2FfxZR2g%3D%3D|pr#a|st#sns.webmail.aol.com|uid#;Auth=ver:22&uas:*UQo5AwAnAytffwJSZAskRiwLBSIDWVpVXxVTVwJCLFxdSnpHUWBbeV1jcikERgl6CEYLJUweGUhdFQQLW1h%2bBAZRcllWfVl8VH4DUmRaZARoPhw%2bBFBA&idl:0&un:*UQo5AwAnAytffwJSYg%3d%3d&at:SNS&sn:*UQo5AwAnAytffwJSYg%3d%3d&wim:%252FwQCAAAAAAAEk2ihy%252BE4MMebm4R1jvxY07zNZhFOHSz2EFBnsNdOAUsl8QyZceo54kWYZ4vwVayLFF7w&sty:0&ud:aol.com&uid:hl1lkgReIh&ss:635417678271359104&svs:SNS_AA%7c1406185424&la:635417687268954835&aat:A&act:M&br:100&cbr:AOL&mt:&pay:0&mbt:G&uv:AOL&lc:en-us&bid:1&acd:1403348988&pix:3829&prmc:825345&relm:aol&mah:%2\nConnection:保持活着\n“
并且希望从数据中搜索Content: application/json,而不是超出这一行。我试过了
details/select?q=Content%3A_Content-Type_json*&wt=json&indent=true
但它在整个内容中搜索。我需要限制搜索内容
发布于 2016-06-01 10:52:14
我认为在这种情况下是不可能的。您可以检查highlighter以返回突出显示响应中的前200个字符。
您可能需要考虑编写一个自定义的响应编写器,它可以在这方面提供帮助。
使用indexed="false" stored="true"创建附加字段的cab选项将更有效。
创建原始字段indexed="true" stored="false",您的索引大小将减小。新的复制字段将是indexed="false" stored="true"。
<copyField source="text" dest="textShort" maxChars="200"/>看看这个是否适合你。
发布于 2016-06-01 20:07:31
你真的应该,真的,真的预处理你的数据,只是索引你将要使用的部分。事后这样做并不是一个很好的解决方案,因为索引中已经包含了大部分内容,并且您正在寻找一个不位于一个特定字节位置的分隔符(这正是maxChars能够做到的)。
根据您的索引方式,您可以在索引步骤中(正则变换,在您自己的代码中使用SolrJ等)进行索引,也可以在代码的分析步骤中使用类似于patternreplacefilter的方法。这将允许您删除后的标题,您正在寻找的任何东西。
这样,您应该能够将内容索引到一个header字段和一个body字段中,这取决于您的需要。
https://stackoverflow.com/questions/37566331
复制相似问题