有点奇怪。我有许多excel文件,每个文件中有超过50万个数据点。有些文件我可以删除停止和进行柠檬化,大多数文件都会出现以下错误
txt = "".join([c for c in txt if c not in string.punctuation])TypeError:“浮点”对象不可迭代
感谢你的任何帮助
import nltk
wn = nltk.WordNetLemmatizer()
ps = nltk.PorterStemmer()
dir(wn)
import pandas as pd
import re
import string
from nltk.corpus import stopwords
pd.set_option ("display.max_colwidth", 200)
stop_word = set(stopwords.words("english"))
excel_path = (r"C:xxxx-BTC 21-30-4.xlsx")
data = pd.read_excel(excel_path)
data.columns = ["#","id", "date","Name","text_4"]
#clean text
def clean_text(txt) :
txt = "".join([c for c in txt if c not in string.punctuation])
tokens = re.split('\W+',txt)
txt = [word for word in tokens if word not in stop_word]
return txt
data['text_5'] = data['text_4'].apply(lambda x: clean_text(x))
#print(data.head)
#lemmatization
def lemmatization(tolken_txt):
text = [wn.lemmatize(word) for word in tolken_txt]
return text
data["text_6"] = data['text_5'].apply(lambda x: lemmatization(x))
print(data.head)发布于 2022-06-28 13:40:07
你好,如果您确信列data['text_4']中没有浮点,则可能缺少不能交互的值。试着用"“来表示或填充。祝好运。
https://stackoverflow.com/questions/72785326
复制相似问题