首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >基于WindowSummarizer和外生特征的预测

基于WindowSummarizer和外生特征的预测
EN

Stack Overflow用户
提问于 2022-03-21 17:42:20
回答 1查看 132关注 0票数 0

WindowSummarizer允许在指定的滚动窗口中捕获时间序列特征。我试图修改在文档中找到的一个示例。这个功能似乎不适用于实际使用外生特性的模型。

下面是一个基于文档的最小工作示例:

代码语言:javascript
复制
from sktime.forecasting.base import ForecastingHorizon
from sktime.transformations.series.impute import Imputer
from sktime.datasets import load_airline, load_longley
from sktime.forecasting.arima import AutoARIMA
from sktime.forecasting.naive import NaiveForecaster
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.compose import ForecastingPipeline
from sktime.transformations.series.window_summarizer import WindowSummarizer
y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)

kwargs = {
    "lag_config": {
        "mean": ["mean", [[3, 0], [4, 0]]],
    }
}
Z_train = pd.concat([X_train, y_train], axis=1)
Z_test = pd.concat([X_test, y_test], axis=1)
pipe = ForecastingPipeline(
    steps=[
        ("ws", WindowSummarizer(**kwargs, n_jobs=1, target_cols=["GNP"])),
        ("imputer",Imputer('mean')),
        ("forecaster", NaiveForecaster(strategy="drift")),
    ]
)
pipe_return = pipe.fit(y_train, Z_train)
y_pred = pipe_return.predict(fh=fh, X=Z_test) # this works

如果我们把预报员变成使用工程特性的东西,事情就不再那么顺利了:

代码语言:javascript
复制
pipe = ForecastingPipeline(
    steps=[
        ("ws", WindowSummarizer(**kwargs, n_jobs=1, target_cols=["GNP"])),
        ("imputer",Imputer('mean')),
        ("forecaster", AutoARIMA()),
    ]
)
pipe.fit(y_train, X=Z_train)
pipe.predict(fh=fh,X = Z_test) # this throws an error

我怀疑这与Z_train和Z_test之间没有延续有关。第二件事是Imputer。我认为它的工作方式是不应该的-在拟合之后,它应该保存填充空字段的值。

代码语言:javascript
复制
ws = pipe.steps_[0][1]
imp = pipe.steps_[1][1]
imp._transform(ws._transform(Z_test)) 

给出

代码语言:javascript
复制
    GNP_mean_3_0    GNP_mean_4_0    GNPDEFL     UNEMP   ARMED   POP     TOTEMP
1959    501159.333333   NaN     112.6   3813.0  2552.0  123366.0    68655.0
1960    501159.333333   NaN     114.2   3931.0  2514.0  125368.0    69564.0
1961    501159.333333   NaN     115.7   4806.0  2572.0  127852.0    69331.0
1962    501159.333333   NaN     116.9   4007.0  2827.0  130081.0    70551.0
EN

回答 1

Stack Overflow用户

发布于 2022-03-27 13:37:05

库版本.10和更新版本已经修改了WindowSummarizer的行为。它应该是没有问题的。

我想我有工作要做。这不是最优雅的解决方案,但它完成了工作。我以这样的方式修改了WindowSummarizer,它保存了计算所有聚合所需的最小X窗口--保存了X的所有可见记录(默认选项)。

无论何时应用.transform,摘要程序都会尝试更新窗口并重新计算(正确!)这些汇总。为了简单起见,我在这里只关注总结器和更简单的数据集。

代码语言:javascript
复制
def update_X(self,X):
    if self.target_cols==None:
        cols = X.columns
    else:
        cols = self.target_cols
    X_window = self.X_window
    X_window = pd.concat([X_window,X[cols]],axis=0)
    X_window = X_window.groupby(X_window.index).first()
    # would remember only last #min_window rows
    # self.X_window = X_window.iloc[-self.min_window:]
    # would remember all rows
    self.X_window = X_window

def window_size(windows):
    try:
        is_list_of_lists = all(isinstance(i, list) for i in windows)
        if is_list_of_lists:
            size = max(map(sum,windows))
        else:
            size = sum(windows)
        return size
    
    except:
        print('error')
        
class WS(WindowSummarizer):
    def __init__(
        self,
        lag_config,
        n_jobs=-1,
        target_cols=None,
        truncate=None,
        ):

        self.lag_config = lag_config
        self.n_jobs = n_jobs
        self.target_cols = target_cols
        self.truncate = truncate
        self._converter_store_X = dict()
        
        # calculates the minimal window required to calculate the window summaries in lag_config
        self.min_window = max([window_size(x[1]) for key,x in lag_config.items()])
        # empty data frame for data window
        self.X_window = pd.DataFrame()
        
        super(WindowSummarizer).__init__()
        
    def _fit(self, X, y=None):
        update_X(self,X)
        super()._fit(X, y)
        
    def _transform(self, X, y=None):
        X_window = pd.concat([self.X_window,X],axis=0)
        X_window = X_window.groupby(X_window.index).first()
        X_transformed = super()._transform(X_window, y)
        update_X(self,X)
        return X_transformed.loc[X.index]

下面是一个小小的测试:

代码语言:javascript
复制
y = load_airline()
y_train, y_test = temporal_train_test_split(y.iloc[:10])
fh = ForecastingHorizon(y_test.index, is_relative=False)

kwargs = {
    "lag_config": {
        "mean": ["mean", [[3, 1], [4, 1]]],
    }
}

ws = WS(**kwargs, n_jobs=1)
ws.fit(pd.DataFrame(y_train),y_train)
ws.transform(pd.DataFrame(y_test))

代码语言:javascript
复制
    Number of airline passengers_mean_3_1   Number of airline passengers_mean_4_1
1949-08     128.333333  129.25
1949-09     134.666667  133.25
1949-10     143.666667  138.00

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71561875

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档