文章/答案/技术大牛

发布

问如何使用Whoosh创建索引
EN

Stack Overflow用户

提问于 2015-06-11 19:04:53

回答 2查看 3.3K关注 0票数 0

我第一次尝试使用呼呼来进行文本搜索。我想搜索包含单词"XML“的文档。但是因为我是个新手，所以我刚刚写了一个从文档中搜索单词的程序。其中文档是文本文件(myRoko.txt)

import os, os.path
from whoosh import index
from whoosh.index import open_dir
from whoosh.fields import Schema, ID, TEXT
from whoosh.qparser import QueryParser
from whoosh.query import *

if not os.path.exists("indexdir3"):
   os.mkdir("indexdir3")

schema =  Schema(name=ID(stored=True), content=TEXT)
ix = index.create_in("indexdir3", schema)
writer = ix.writer()
path = "myRoko.txt"

with open(path, "r") as f:
   content = f.read()
   f.close()
   writer.add_document(name=path, content= content)

  writer.commit()

  ix = open_dir("indexdir3")
  query_b = QueryParser('content', ix.schema).parse('XML')
  with ix.searcher() as srch:
    res_b = srch.search(query_b)
    print res_b[0]

上面的代码用于打印包含单词"XML“的文档。但是，代码返回以下错误：

    raise ValueError("%r is not unicode or sequence" % value)

    ValueError: 'A large number of documents are now represented and stored      
    as XML document on the web. Thus ................

此错误的原因可能是什么？

python

indexing

unicode

whoosh

回答 2

Stack Overflow用户

发布于 2015-06-27 20:30:07

您遇到了Unicode问题。您应该将unicode字符串传递给索引器。为此，您需要以unicode格式打开文本文件：

import codecs
with codecs.open(path, "r","utf-8") as f:
   content = f.read()

并使用unicode字符串作为文件名：

path = u"myRoko.txt"

修复之后，我得到了这个结果：

<Hit {'name': u'myRoko.txt'}>

票数 1

Stack Overflow用户

发布于 2016-09-08 01:33:16

writer.add_document(name=unicode(path), content=unicode(content))

它必须是UNICODE

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/30779027

复制

相似问题

问如何使用Whoosh创建索引
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用Whoosh创建索引EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用Whoosh创建索引
EN