我正在尝试进行特征提取,并为twitter情感分析项目建立一个模型。然而,我得到了以下错误,我想知道是否有人可以帮助我?
错误:
ValueError: np.nan is an invalid document, expected byte or unicode string.我的代码:
import re
import pickle
import numpy as np
import pandas as pd
# nltk
from nltk.stem import WordNetLemmatizer
# sklearn
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
df = pd.read_csv("updated_tweet_info.csv")
train,test = train_test_split(df, test_size = 0.2, random_state = 42)
train_clean_tweet=[]
for tweet in train['tweet']:
train_clean_tweet.append(tweet)
test_clean_tweet=[]
for tweet in test['tweet']:
test_clean_tweet.append(tweet)
v = CountVectorizer(analyzer = "word")
train_features= v.fit_transform(train_clean_tweet)
test_features=v.transform(test_clean_tweet)
lr = RandomForestRegressor(n_estimators=200)
fit = lr.fit(train)
pred = lr.predict(test)
accuracy = r2_score(train,test)发布于 2020-09-01 01:48:09
您可以尝试将NaN替换为空格-这应该会消除错误:
data = df.fillna(' ')https://stackoverflow.com/questions/63675323
复制相似问题