我对机器学习很陌生,我坚持这个错误:
could not convert string to float: ' 8,400,000,000我该怎么办?
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_absolute_error
from sklearn import linear_model
df = pd.read_csv("housePrice.csv")
print(df.isna().sum())
print(df.head())
print(df.describe())
x = df[["Area","Room","Parking","Warehouse"]]
np.reshape(x , (3479, 4))
y = df.Price
print(x.shape)
print(y.shape)
print(df.info())
filler = df.fillna(method="ffill")
filler = df.fillna(method="bfill")
train_x, test_x, train_y, test_y = train_test_split(x , y ,random_state=0, test_size=0.3)
dt = DecisionTreeRegressor()
dt.fit(train_x , train_y)
pred_y = dt.predict(test_x)
print("MAE:" ,mean_absolute_error(test_y , pred_y))发布于 2022-07-25 18:08:55
根据错误消息:could not convert string to float: ' 8,400,000,000',数据帧中的一个列有一个值' 8,400,000,000,它抛出错误,因为进入DecisionTree的所有数据都需要是数字的。需要首先将所有str类型数据转换为数字数据。
对于此错误消息中的特定值,需要删除' 8,400,000,000逗号,、前面的空格和撇号,以便使其成为可以转换为float类型的数字格式。
这样做的一种方法是:
df[COLUMN1] = df[COLUMN1].replace("'", '') # Replace apostrophe with empty string
df[COLUMN1] = df[COLUMN1].replace(",", '') # Relace comma with empty string
df[COLUMN1] = df[COLUMN1].replace(" ", '') # Replace whitespace with empty string
df[COLUMN1] = df[COLUMN1].astype(float) # Finally, convert column to float type下面是另一个在熊猫专栏中替换这些字符的SO条目:replacing quotes, commas, apostrophes w/ regex - python/pandas
https://stackoverflow.com/questions/73089181
复制相似问题