文章/答案/技术大牛

发布

社区首页 >问答首页 >如何使用python对twitter文本数据进行预处理

问如何使用python对twitter文本数据进行预处理
EN

Stack Overflow用户

提问于 2016-10-07 14:53:45

回答 1查看 819关注 0票数 0

在以这种格式从mongoDB检索后，我获得了文本数据：

[u'In', u'love', u'#Paralympics?\U0001f60d', u"We've", u'got', u'nine', u'different', u'sports', u'live', u'streams', u'https://not_a_real_link', u't_https://anotherLink']

[u't_https://somelink']

[u'RT', u'@sportvibz:', u'African', u'medal', u'table', u'#Paralympics', u't_https://somelink', u't_https://someLink']

但是，我想将列表中的所有URL替换为'URL‘，同时保留列表中的其他文本，即如下所示：

[u'In', u'love', u'#Paralympics?\U0001f60d', u"We've", u'got', u'nine', u'different', u'sports', u'live', u'streams', u'URL', u'URL']

但是，当我运行停止字删除的代码并执行正则表达式时，我得到了以下结果示例：

In

URL

RT

请任何人帮忙，因为我觉得这很困难。

下面是我目前掌握的代码：

def stopwordsRemover(self, rawText):
    stop = stopwords.words('english')
    ##remove stop words from the rawText argument and store the result list in processedText variable
    processedText = [i for i in rawText.split() if i not in stop]
    return processedText


def clean_text(self, rawText):
    temp_raw = rawText
    for i, text in enumerate(temp_raw):
        temp = re.sub(r'https?:\/\/.*\/[a-zA-Z0-9]*', 'URL', text)
    return temp

python

regex

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-10-07 14:58:05

这是错误的：

def clean_text(self, rawText):
    temp_raw = rawText
    for i, text in enumerate(temp_raw):
        temp = re.sub(r'https?:\/\/.*\/[a-zA-Z0-9]*', 'URL', text)
    return temp

您返回最后一个替换字符串，而不是一个列表，它应该取代您的rawText输入列表(我必须承认，我对您似乎得到了第一项的快速操作感到困惑，但我仍然对解释很有信心)

取而代之的是：

def clean_text(self, rawText):
    temp = list()
    for text in rawText:
        temp.append(re.sub(r'https?:\/\/.*\/\w*', 'URL', text))  # simpler regex with \w
    return temp

有一个列表：

def clean_text(self, rawText):
   return [re.sub(r'https?:\/\/.*\/\w*', 'URL', text) for text in rawText]

您还可以就地工作，直接修改rawText：

def clean_text(self, rawText):
    rawText[:] = [re.sub(r'https?:\/\/.*\/\w*', 'URL', text) for text in rawText]

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/39920183

复制

相似问题

问如何使用python对twitter文本数据进行预处理
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用python对twitter文本数据进行预处理EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用python对twitter文本数据进行预处理
EN