我正在尝试将几个文件从csv导入到单个DataFrame中,并在尝试添加第三个DataFrame时得到以下错误。
AssertionError: cannot create BlockManager._ref_locs because block [ObjectBlock: [CompletionDate, Categories, DateEntered_x, <lots more columns here>...], dtype=object)] does not have _ref_locs set守则是:
project = pandas.read_csv(read_csv('dbo_Project.csv')
project = pandas.read_csv(read_csv('dbo_ProjectEnergy.csv')
project = pandas.read_csv(read_csv('dbo_BuildingDescription.csv')
part_merged = pandas.merge(project, project_energy,
on='ProjectID',
how='outer')
part_merged = pandas.merge(part_merged, project_energy_data,
on='ProjectEnergyID',
how='outer')
part_merged = pandas.merge(part_merged, building_description,
on='ProjectEnergyID',
how='outer')我应该如何加入这些DataFrames来避免这个问题?
编辑回应Stefan Jansen:的回答
直到出现新错误的新代码如下:
project = pandas.read_csv(read_csv('dbo_Project.csv')
project = pandas.read_csv(read_csv('dbo_ProjectEnergy.csv')
part_merged = pandas.concat([project, project_energy],
axis=1,
join='outer')
part_merged.set_index(['ProjectEnergyID'])
part_merged = pandas.concat([self.part_merged,
project_energy_data],
axis=1,
join='outer')发布于 2013-08-16 16:06:38
一个很简单的回答。
这个问题是重复列的。引起问题的列并不重要,所以我只是在合并之前删除了它们。
def remove_clashes(df):
unwanted_cols = ['DataCompleteness', 'DeletedFlag','DateEntered', 'EnteredBy',
'LastModified', 'MandatoryDataInput', 'ModifiedBy']
return df.drop([col for col in unwanted_cols if col in df.columns], axis=1)发布于 2013-08-11 16:10:51
我更喜欢对多个帧使用pandas.concat()。还有'outer'选项- 见文件。
如果您想要合并的列是索引列(您可以使用pandas.set_index()实现索引列,可能前面是.reset_index() ),这将很好地工作。
https://stackoverflow.com/questions/18173753
复制相似问题