我每小时的数据看起来像这样的数据帧的df和它的大小(3418896,9)。我需要拟合威布尔分布的数据,但我需要的输出(形状,位置,规模)被分组的‘工厂名称’,‘月’和‘年’。
plant_name business_name business_code maint_region_name wind_speed_ms mos_time dataset month year
0 MAPLE RIDGE II UNITED STATES USA EAST 10.06 2021-09-22 13:00:00 ERA5 9 2021
1 MAPLE RIDGE II UNITED STATES USA EAST 10.04 2021-09-22 12:00:00 ERA5 9 2021
2 MAPLE RIDGE II UNITED STATES USA EAST 9.84 2021-09-22 11:00:00 ERA5 9 2021
3 MAPLE RIDGE II UNITED STATES USA EAST 10.67 2021-09-22 10:00:00 ERA5 9 2021
4 MAPLE RIDGE II UNITED STATES USA EAST 11.47 2021-09-22 09:00:00 ERA5 9 2021我需要一个形状,比例值为每个plant_name,月,年从'df‘。我已经在下面尝试过了,但是我只得到了一个形状和比例的值,我需要一个单独的形状,每个plant_name,月份和年份的比例。这是我的尝试,它只提供了一个形状,比例的数字:
from scipy.stats import weibull_min
shape, loc, scale = weibull_min.fit(ncData.groupby(['plant_name','month','year']).apply(lambda x:x['wind_speed_ms']), floc=0)
shape
Out[21]: 2.2556719467040596
scale
Out[22]: 7.603953856897537我不知道如何通过groupby‘plant’‘name’,'month','year‘将输出发送到'shape’和'scale‘参数。非常感谢您抽出时间来帮助我做一些我可以尝试的事情。
发布于 2021-11-11 15:53:07
这应该是可行的
import pandas as pd
from scipy.stats import weibull_min
# function applied to each ('plant_name','month','year') group
def fit_weibull(g):
# get wind speed data from the group
data = g['wind_speed_ms']
# fit weibull_min to the group wind data
params = weibull_min.fit(data)
# Return the fit parameters as a Series (each parameter will correspond to a different column)
return pd.Series(params, index=['shape', 'loc', 'scale'])
fit_params = ncData.groupby(['plant_name', 'month', 'year']).apply(fit_weibull)https://stackoverflow.com/questions/69930693
复制相似问题