WindowSummarizer允许在指定的滚动窗口中捕获时间序列特征。我试图修改在文档中找到的一个示例。这个功能似乎不适用于实际使用外生特性的模型。
下面是一个基于文档的最小工作示例:
from sktime.forecasting.base import ForecastingHorizon
from sktime.transformations.series.impute import Imputer
from sktime.datasets import load_airline, load_longley
from sktime.forecasting.arima import AutoARIMA
from sktime.forecasting.naive import NaiveForecaster
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.compose import ForecastingPipeline
from sktime.transformations.series.window_summarizer import WindowSummarizer
y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)
kwargs = {
"lag_config": {
"mean": ["mean", [[3, 0], [4, 0]]],
}
}
Z_train = pd.concat([X_train, y_train], axis=1)
Z_test = pd.concat([X_test, y_test], axis=1)
pipe = ForecastingPipeline(
steps=[
("ws", WindowSummarizer(**kwargs, n_jobs=1, target_cols=["GNP"])),
("imputer",Imputer('mean')),
("forecaster", NaiveForecaster(strategy="drift")),
]
)
pipe_return = pipe.fit(y_train, Z_train)
y_pred = pipe_return.predict(fh=fh, X=Z_test) # this works如果我们把预报员变成使用工程特性的东西,事情就不再那么顺利了:
pipe = ForecastingPipeline(
steps=[
("ws", WindowSummarizer(**kwargs, n_jobs=1, target_cols=["GNP"])),
("imputer",Imputer('mean')),
("forecaster", AutoARIMA()),
]
)
pipe.fit(y_train, X=Z_train)
pipe.predict(fh=fh,X = Z_test) # this throws an error我怀疑这与Z_train和Z_test之间没有延续有关。第二件事是Imputer。我认为它的工作方式是不应该的-在拟合之后,它应该保存填充空字段的值。
ws = pipe.steps_[0][1]
imp = pipe.steps_[1][1]
imp._transform(ws._transform(Z_test)) 给出
GNP_mean_3_0 GNP_mean_4_0 GNPDEFL UNEMP ARMED POP TOTEMP
1959 501159.333333 NaN 112.6 3813.0 2552.0 123366.0 68655.0
1960 501159.333333 NaN 114.2 3931.0 2514.0 125368.0 69564.0
1961 501159.333333 NaN 115.7 4806.0 2572.0 127852.0 69331.0
1962 501159.333333 NaN 116.9 4007.0 2827.0 130081.0 70551.0发布于 2022-03-27 13:37:05
新
库版本.10和更新版本已经修改了WindowSummarizer的行为。它应该是没有问题的。
老
我想我有工作要做。这不是最优雅的解决方案,但它完成了工作。我以这样的方式修改了WindowSummarizer,它保存了计算所有聚合所需的最小X窗口--或保存了X的所有可见记录(默认选项)。
无论何时应用.transform,摘要程序都会尝试更新窗口并重新计算(正确!)这些汇总。为了简单起见,我在这里只关注总结器和更简单的数据集。
def update_X(self,X):
if self.target_cols==None:
cols = X.columns
else:
cols = self.target_cols
X_window = self.X_window
X_window = pd.concat([X_window,X[cols]],axis=0)
X_window = X_window.groupby(X_window.index).first()
# would remember only last #min_window rows
# self.X_window = X_window.iloc[-self.min_window:]
# would remember all rows
self.X_window = X_window
def window_size(windows):
try:
is_list_of_lists = all(isinstance(i, list) for i in windows)
if is_list_of_lists:
size = max(map(sum,windows))
else:
size = sum(windows)
return size
except:
print('error')
class WS(WindowSummarizer):
def __init__(
self,
lag_config,
n_jobs=-1,
target_cols=None,
truncate=None,
):
self.lag_config = lag_config
self.n_jobs = n_jobs
self.target_cols = target_cols
self.truncate = truncate
self._converter_store_X = dict()
# calculates the minimal window required to calculate the window summaries in lag_config
self.min_window = max([window_size(x[1]) for key,x in lag_config.items()])
# empty data frame for data window
self.X_window = pd.DataFrame()
super(WindowSummarizer).__init__()
def _fit(self, X, y=None):
update_X(self,X)
super()._fit(X, y)
def _transform(self, X, y=None):
X_window = pd.concat([self.X_window,X],axis=0)
X_window = X_window.groupby(X_window.index).first()
X_transformed = super()._transform(X_window, y)
update_X(self,X)
return X_transformed.loc[X.index]下面是一个小小的测试:
y = load_airline()
y_train, y_test = temporal_train_test_split(y.iloc[:10])
fh = ForecastingHorizon(y_test.index, is_relative=False)
kwargs = {
"lag_config": {
"mean": ["mean", [[3, 1], [4, 1]]],
}
}
ws = WS(**kwargs, n_jobs=1)
ws.fit(pd.DataFrame(y_train),y_train)
ws.transform(pd.DataFrame(y_test))
Number of airline passengers_mean_3_1 Number of airline passengers_mean_4_1
1949-08 128.333333 129.25
1949-09 134.666667 133.25
1949-10 143.666667 138.00
https://stackoverflow.com/questions/71561875
复制相似问题