我目前有一个如下所示的dataframe:
Idnumber Ownership Date
1 100 2006
2 >50 2006
1 80 2007
3 NaN 2006所有权列当前的类型为浮动。我想要的是在Idnumber上创建一个groupby函数,该函数返回每个idnumber的最大值。问题是,这是不可能的事情,如<或±在那里(错误:不可排序的类型:>= (),str())。
df['Ownership'] = df['Ownership'].astype(str)
df['Ownership'] = df['Ownership'].map(lambda x: x.strip('± = > + <'))
df['Ownership'] = df['Ownership'].astype(float).fillna(0.0)
df['Ownershipadjusted']= df['Ownership'].groupby([df['Idnumber'],df['Ownership']]).max()实际上无法工作,因为将其转换回浮动会产生一个错误:无法将字符串转换为浮动。
df['Ownership'] = df['Ownership'].apply(pd.to_numeric, errors='coerce')也没有所需的效果。是否有更直接的方法从浮点数中删除符号,或使此转换工作?
为了避免混淆,这是我需要的:
Idnumber Ownership Date Ownership adjusted
1 100 2006 100
2 50 2006 50
1 80 2007 100
3 0 2006 0当然,dataframe包含了4个以上的观测结果。
发布于 2016-02-17 14:59:50
将dtype转换为str,然后将数字转换为extract,然后将dtype转换回float
In [215]:
df['Ownership'] = df['Ownership'].astype(str).str.extract('(\d+)').astype(float)
df
Out[215]:
Idnumber Ownership Date
0 1 100 2006
1 2 50 2006
2 1 80 2007
3 3 NaN 2006另外,您的groupby语句是错误的,您需要这样做:
In [218]:
df['Ownershipadjusted']= df.groupby(['Idnumber'])['Ownership'].transform('max')
df
Out[218]:
Idnumber Ownership Date Ownershipadjusted
0 1 100 2006 100
1 2 50 2006 50
2 1 80 2007 100
3 3 NaN 2006 NaNhttps://stackoverflow.com/questions/35460075
复制相似问题