我在使用groupby函数和pandas模块时遇到了问题。我收到错误: DataError:没有要聚合的数值类型
我不确定我做错了什么,数据框中有数字数据。
下面是我的代码:
lte_columns = ['Period start','Period end','zone','usid','site id','rank','Total LCQI Impact','LTE BLOCK Impact','LTE DROP Impact','LTE TPUT Impact','engineer notes']
#lte_df = pd.DataFrame(dtype=float)
lte_df = pd.DataFrame(dtype=float)
## iterate over the CQI impact file seperate LTE from UMTS and perform lookup for each Technology/USID
testFile = "sample_CSCT_CQI_IMPACT_Greater Midwest_20160305_20160311.xls"
df = pd.read_excel(testFile,sheetname="Sheet1")
weekBegin = df['Date'].min()
weekEnd = df['Date'].max()
## update new dataFrames while iterating over input dataframe
for idx, row in df.iterrows():
usid = row['USID']
region, zone = row['District & Zone'].split('-')
if usid in lte_lookup:
site_id = lte_lookup[usid][1]
else:
site_id = "N/A"
lte = pd.Series([weekBegin,weekEnd,zone,usid,site_id,'0','0','0','0','0','0'])
lte_df = lte_df.append(lte,ignore_index=True)
lte_df.columns = lte_columns
grps = lte_df.groupby(['usid'])
avgs = grps.mean()
avgs.to_excel("pandas_out.xlsx",merge_cells=False)
print "done"下面是lte_df的样例:
>>> print lte_df
Period start Period end zone usid site id rank Total LCQI Impact LTE BLOCK Impact LTE DROP Impact LTE TPUT Impact engineer notes
0 03/05/2016 03/11/2016 69E 56788.0 MOL02607 0 0 0 0 0 0
1 03/05/2016 03/11/2016 70F 58438.0 KSL05065 0 0 0 0 0 0
2 03/05/2016 03/11/2016 69A 120595.0 MOL00531W 0 0 0 0 0 0
3 03/05/2016 03/11/2016 70D 75566.0 KSL04272 0 0 0 0 0 0
4 03/05/2016 03/11/2016 70F 58454.0 KSL05106 0 0 0 0 0 0
5 03/05/2016 03/11/2016 70E 41793.0 KSL04151 0 0 0 0 0 0
6 03/05/2016 03/11/2016 70C 9500.0 KSL06382 0 0 0 0 0 0
7 03/05/2016 03/11/2016 69A 56586.0 MOL01143 0 0 0 0 0 0
>>> lte_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6565 entries, 0 to 6564
Data columns (total 11 columns):
Period start 6565 non-null object
Period end 6565 non-null object
zone 6565 non-null object
usid 6565 non-null float64
site id 6565 non-null object
rank 6565 non-null object
Total LCQI Impact 6565 non-null object
LTE BLOCK Impact 6565 non-null object
LTE DROP Impact 6565 non-null object
LTE TPUT Impact 6565 non-null object
engineer notes 6565 non-null object
dtypes: float64(1), object(10)
memory usage: 615.5+ KB
>>>发布于 2016-09-11 00:23:12
根据DataFrame中已有的数据,groupby不起作用,因为您的代码试图确定列的平均值,但无法确定,因为它们不是浮点型。即使其他列中的零也是字符串。
所以这是行不通的:
grps = lte_df.groupby(['usid'])
avgs = grps.mean()但是举个例子
grps = lte_df[['Period start', 'usid']].groupby(['Period start'])
avgs = grps.mean()将会工作,因为它是按某一列分组的,而剩下的唯一一列是一个浮点数,因此会返回一些东西。我知道这不是你想要做的,但它是一个如何工作的例子。
https://stackoverflow.com/questions/36189459
复制相似问题