我正在尝试合并以下两个数据格式on=SICcode
df.head(5)
SICcode Catcode Category SICname MultSIC
0 111 A1500 Wheat, corn, soybeans and cash grain Wheat X
1 112 A1600 Other commodities (incl rice, peanuts) Rice X
2 115 A1500 Wheat, corn, soybeans and cash grain Corn X
3 116 A1500 Wheat, corn, soybeans and cash grain Soybeans X
4 119 A1500 Wheat, corn, soybeans and cash grain Cash grains X
df.columns.tolist()
['\ufeffSICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']
merged.head()
2012 NAICS Code 2002to2007 NAICS SICcode
0 111110 111110 116
1 111120 111120 119
2 111130 111130 119
3 111140 111140 111
4 111150 111150 115
merged.columns.tolist()
['2012 NAICS Code', '2002to2007 NAICS', 'SICcode']当我试图将它们与以下代码合并时:
merged=pd.merge(merged,df, how='left', on='SICcode') 我得到了一个Keyerror: 'SICcode',我试图设置dfs的dtype of 一,但是当我这样做时,我收到了一个Keycode error。
如果有人对此有任何想法或要求更多的信息,请告诉我。
发布于 2016-04-22 19:05:46
请注意第一栏:
In [27]: df = pd.read_csv('https://github.com/108michael/ms_thesis/raw/master/df.test', index_col=0)
In [28]: df.columns.tolist()
Out[28]: ['\ufeffSICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']
In [29]: df['SICcode']
...
KeyError: 'SICcode'
In [30]: df['\ufeffSICcode'].head()
Out[30]:
0 111
1 112
2 115
3 116
4 119
Name: SICcode, dtype: int64正如@unutbu在他的评论中所说,将encoding='utf-8_sig'添加到pd.read_csv()调用中可能会帮助您解决这个问题:
In [31]: df = pd.read_csv('https://github.com/108michael/ms_thesis/raw/master/df.test', index_col=0, encoding='utf-8_sig')
In [32]: df.columns.tolist()
Out[32]: ['SICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']https://stackoverflow.com/questions/36801328
复制相似问题