文章/答案/技术大牛

发布

社区首页 >问答首页 >不同计算机上与NLTK库相关的python代码的不同结果

问不同计算机上与NLTK库相关的python代码的不同结果
EN

Stack Overflow用户

提问于 2017-02-07 02:43:52

回答 1查看 142关注 0票数 1

我编写了以下代码，它在我的计算机上运行良好，但在其他计算机上返回null。你能帮我解决这个问题吗？

import string
import nltk
from nltk.tokenize import RegexpTokenizer
from nltk.corpus import stopwords

def preprocess(sentence):
    sentence = sentence.lower()
    specialChrs={'\xc2',''} 
    pattern=pattern = r'''(?x)               # set flag to allow verbose regexps
              ([A-Z]\.)+         # abbreviations, e.g. U.S.A.
              | \$?\d+%?
              | \$?\d+(,|.\d+)*
              | \w+([-'/]\w+)*    # words w/ optional internal hyphens/apostrophe
              |/\m+([-'/]\w+)*
            '''
    tokenizer = RegexpTokenizer(pattern)
    tokens = tokenizer.tokenize(sentence)
    print tokens
    realToken= [e for e in tokens if  len(e)>= 3 and len(e)<10]
    stopWords = set(stopwords.words('english'))
    stop_words = [w for w in realToken if not w in stopWords]
    filtered_words = [w for w in stop_words if not w in specialChrs]
    print filtered_words
   # final_words = [w for w in filtered_words if not w[0]=='0' and w[1]=='x']
    return filtered_words


str='I have one generalized rule, where in shellscript I check for all need packages, if any package does not exist, then install it other wise skip to next check. As I need to check and execute few other python as well shellscripts, I am using it. Is using shellscript for this is bad idea?'
preprocess(str)

这些是我电脑输出的一部分：

“我”、“有”、“一”、“广义”、“规则”、“哪里”、“在哪里”、“外壳”、“我”、“检查”、“为”、“全部”、“需要”、.“想法”

其他计算机的结果是：

(“、”)、“”、“)、”、“.

我的电脑信息

Python2.7.12Anaconda 2.3.0 (64位)(默认，2016年7月2日，17:42:40) GCC 4.4.7 20120313 (RedHat4.4.7-1)在linux2类型“帮助”、“版权”、“信贷”或“许可”中获得更多信息。Anaconda是由连续分析带给你的。请查看：http://continuum.io/thanks和https://anaconda.org 进口nltk 打印(‘nltk版本是{}.'.format(nltk.version)) nltk版本为3.2.1。

我的朋友电脑

Python2.7.12Anaconda 4.1.1 ( 64位)(64位)(缺省值，2016年6月29日，11:42:40) MSC v.1500 64位(AMD64)上的win32类型“帮助”、“版权”、“信用”或“许可”以获取更多信息。Anaconda是由连续分析带给你的。请查看：http://continuum.io/thanks和https://anaconda.org 进口nltk 打印(‘nltk版本是{}.'.format(nltk.version)) nltk版本为3.2.1。

另外，我在另一台计算机上测试我的代码，并得到相同的结果。

那台计算机的信息是：

Python2.7.3(默认，2016年10月26日，21:01:49) GCC 4.6.3在linux2类型“帮助”、“版权”、“信用”或“许可”中获得更多信息。

nltk

anaconda

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-02-08 14:47:38

in this page回答了您的问题

您需要以这种方式和顺序更改正则表达式以解决问题。

`pattern = r'''(?x)          # set flag to allow verbose regexps
            (?:[A-Z]\.)+        # abbreviations, e.g. U.S.A.
         | \$?\d+(?:\.\d+)?%?
         | \w+(?:-\w+)*        # words with optional internal hyphens
         |/\m+(?:[-'/]\w+)*
      '''`

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/42080797

复制

相似问题

问不同计算机上与NLTK库相关的python代码的不同结果
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问不同计算机上与NLTK库相关的python代码的不同结果EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问不同计算机上与NLTK库相关的python代码的不同结果
EN