我试图通过以下方式来标记我从网络上得到的文本:
import nltk,re,pprint
from nltk import word_tokenize
from urllib import request
#...getting file from web
tokens=word_tokenize(raw) #raw is the text from web然后LookupError来了:
Traceback (most recent call last):
File "<pyshell#56>", line 1, in <module>
tokens = word_tokenize(raw)
File "/usr/local/lib/python3.7/site-packages/nltk/tokenize/__init__.py", line 129, in word_tokenize
sentences = [text] if preserve_line else sent_tokenize(text, language)
File "/usr/local/lib/python3.7/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize
tokenizer = load("tokenizers/punkt/{0}.pickle".format(language))
File "/usr/local/lib/python3.7/site-packages/nltk/data.py", line 752, in load
opened_resource = _open(resource_url)
File "/usr/local/lib/python3.7/site-packages/nltk/data.py", line 877, in _open
return find(path_, path + [""]).open()
File "/usr/local/lib/python3.7/site-packages/nltk/data.py", line 585, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource [93mpunkt[0m not found.
Please use the NLTK Downloader to obtain the resource:
[31m>>> import nltk
>>> nltk.download('punkt')
[0m
For more information see: https://www.nltk.org/data.html
Attempted to load [93mtokenizers/punkt/PY3/english.pickle[0m
Searched in:
- '/Users/ic/nltk_data'
- '/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/nltk_data'
- '/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/share/nltk_data'
- '/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- ''我意识到这可能是因为我没有下载“punkt”,然后我尝试从python下载:
nltk.download('punkt')但其结果如下:
[nltk_data] Error loading punkt: <urlopen error [Errno 61] Connection
[nltk_data] refused>
False我想也许互联网连接有一些问题?,所以我也从网上下载了punkt软件包,并把它放在我的站点包中的nltk文件中。但我在一开始就遇到了同样的问题。现在不要做这件事了,哈哈!有什么建议吗!
发布于 2020-06-15 03:57:38
我想我可以简单地把单词分成单子来解决这个问题!完成了!
https://stackoverflow.com/questions/62295681
复制相似问题