我正在尝试加载一个sklearn.dataset,并根据键(target_names、target & DESCR)缺少一列。我尝试过各种方法来包含最后一栏,但是有错误。
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
print cancer.keys()这些键是‘target_name’,'data','target','DESCR','feature_names‘
data = pd.DataFrame(cancer.data, columns=[cancer.feature_names])
print data.describe()使用上面的代码,当我需要31列时,它只返回30列。什么是最好的方式加载科学知识-学习数据集到熊猫DataFrame。
发布于 2017-07-17 07:31:36
另一个选项,但只有一个线性,创建包含特性和目标变量的数据文件是:
import pandas as pd
import numpy as np
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
df = pd.DataFrame(np.c_[cancer['data'], cancer['target']],
columns= np.append(cancer['feature_names'], ['target']))发布于 2017-06-03 05:46:12
如果您想要有一个target列,就需要添加它,因为它不在cancer.data中。cancer.target有带有0或1的列,而cancer.target_names有标签。我希望以下是你想要的:
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
print cancer.keys()
data = pd.DataFrame(cancer.data, columns=[cancer.feature_names])
print data.describe()
data = data.assign(target=pd.Series(cancer.target))
print data.describe()
# In case you want labels instead of numbers.
data.replace(to_replace={'target': {0: cancer.target_names[0]}}, inplace=True)
data.replace(to_replace={'target': {1: cancer.target_names[1]}}, inplace=True)
print data.shape # data.describe() won't show the "target" column here because I converted its value to string.发布于 2017-06-03 05:59:16
这也是工作的,也使用pd.Series。
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
print cancer.keys()
data = pd.DataFrame(cancer.data, columns=[cancer.feature_names])
data['Target'] = pd.Series(data=cancer.target, index=data.index)
print data.keys()
print data.shapehttps://stackoverflow.com/questions/44340445
复制相似问题