我有一个包含多列的数据格式。当我执行以下代码时,它将第一列的标题分配给第二列,从而使第一列不可访问。
COLUMN_NAMES = ['id', 'diagnosis', 'radius_mean', 'texture_mean', 'perimeter_mean', 'area_mean',
'smoothness_mean', 'compactness_mean', 'concavity_mean',
'concave_points_mean', 'symmetry_mean', 'fractal_dimension_mean',
'radius_se', 'texture_se', 'perimeter_se', 'area_se', 'smoothness_se',
'compactness_se', 'concavity_se', 'concave_points_se', 'symmetry_se',
'fractal_dimension_se', 'radius_worst', 'texture_worst',
'perimeter_worst', 'area_worst', 'smoothness_worst',
'compactness_worst', 'concavity_worst', 'concave_points_worst',
'symmetry_worst']
TUMOR_TYPE = ['M', 'B']
path_to_file = list(files.upload().keys())[0]
data = pd.read_csv(path_to_file, names=COLUMN_NAMES, header=0)
print(data)
id diagnosis ... concave_points_worst symmetry_worst
842302 M 17.99 ... 0.4601 0.11890
842517 M 20.57 ... 0.2750 0.08902
84300903 M 19.69 ... 0.3613 0.08758id标记应该与第一列相关联,但它与第二列相关联,从而导致最后一列标题被删除。
发布于 2020-10-09 17:48:31
pd.read_csv将使您的第一列成为索引,而不是像列表中其他列那样的列。
您可以将其更新为:
path_to_file = list(files.upload().keys())[0]
data = pd.read_csv(path_to_file, names=COLUMN_NAMES, header=0,index_col = False)以确保第一列不被视为索引。
https://stackoverflow.com/questions/64284804
复制相似问题