我有一组汽车的数据。数据集包含汽车的制造商(品牌)、汽车的型号名称、生产年份和当前市场价格。
|Make|Model|mfgYear|price
|Audi| A4 | 2007 |3429999
|Audi| A5 | 2008 |2900000
|Audi| A5 | 2009 |3000000
|Audi| A4 | 2011 |4000000
.......我想知道,对于每个制造模型组合(或制造模型组),价格每年的平均降幅是多少。例如,如果团队是make->Ford,model->Focus,我想知道随着汽车年龄的增长,市场价格会以什么速度下降。
下面的代码进行分组,并显示前两个组的样子。
gb = df.groupby(['make','model'])
for (name, group),i in zip(gb,range(3)):
print name
print group
('Audi', 'A3')
|make |model |mfgYear | price
19 |Audi | A3 | 2014 |3300000
('Audi', 'A4')
|make| model | mfgYear | price
20 |Audi| A4 | 2014 |3100000
406 |Audi| A4 | 2012 |1799000任何帮助都将不胜感激。我认为这个问题可能属于聚类分析的范畴,但我不太确定。
我的目标是。
|Make|Model|averageAnnualDepreciation
|Audi| A4 | <average of (priceCorrespondingToMostRecentYear - price)/(mostRecentYear - year)>
|Audi| A5 | <average of (priceCorrespondingToMostRecentYear - price)/(mostRecentYear - year)>
....发布于 2015-11-01 17:32:13
对于此DataFrame:
Make Model mfgYear price
0 Audi A4 2007 3429999
1 Audi A5 2008 2900000
2 Audi A5 2009 3000000
3 Audi A4 2011 4000000
4 Audi A5 2007 2500000
5 Audi A4 2010 3200000I组:
gb = df.groupby(['Make','Model'])现在我可以应用一个函数:
def avg(group):
year = group['mfgYear']
price = group['price']
last_year = year.max()
last_price = price[year == last_year]
other_prices = price[year != last_year]
other_years = year[year != last_year]
down = ((last_price.values - other_prices) /
(last_year - other_years)).sum() / len(other_years)
return down
gb.apply(avg)这给出了这个结果:
Make Model
Audi A4 471250.125
A5 175000.000
dtype: float64这与手工计算的A4数字相对应:
((4000000 - 3200000) + (4000000 - 3429999) / 4) / 2
471250.125发布于 2015-11-01 17:11:28
这两个组中的每一个,在for循环中的别名‘group’,都是数据帧。对于每个组-即每个数据帧-我所做的是...
我构造了一个单独的数据框架,其中一列是"make",另一列是"model",第三列是“平均年折旧”。这实际上归结为如何将函数顺序应用于数据帧的行。
df_result = pd.DataFrame()
gb = df.groupby(['make','model'])
for (name, group),i in zip(gb,range(3)):
print name
gp1 = group.groupby('mfgYear').mean() # This gives the mean price of every year
gp1 = gp1.sort_index(ascending=False)
depreciations = gp1.apply(<func for cal. depreciations>)
data = {}
data['make-model'],data['annualDepreciation'] = name,depreciations.mean()
df_result.append(data,ignore_index=True)https://stackoverflow.com/questions/33448661
复制相似问题