问LSTM多特征回归数据准备
EN

Stack Overflow用户

提问于 2019-08-10 09:04:43

回答 1查看 102关注 0票数 3

我正在对包含多个特性和一个目标值的LSTM模型进行建模。这是一个回归问题。我怀疑我为LSTM准备的数据是错误的；主要是因为模型只了解目标值的平均值。

我编写的以下代码用于为LSTM准备数据：

# df is a pandas data frame that contains the feature columns (f1 to f5) and the target value named 'target'
# all columns of the df are time series data (including the 'target')
# seq_length is the sequence length 
def prepare_data_multiple_feature(df):
    X = []
    y = []

    for x in range(len(df)):
        start_id = x
        end_id = x + seq_length
        one_data_point = []
        if end_id + 1 <= len(df):
            # prepare X
            for col in ['f1', 'f2', 'f3', 'f4', 'f5']:
                one_data_point.append(np.array(df[col].values[start_id:end_id]))
            X.append(np.array(one_data_point))
            # prepare y
            y.append(np.array(df['target'].values[end_id ])) 

    assert len(y) == len(X)
    return X, y

然后，我按如下方式重塑数据：

X, y = prepare_data_multiple_feature(df)
X = X.reshape((len(X), seq_length, 5)) #5 is the number of features, i.e., f1 to f5

我的数据准备方式和数据整形是否正确？

time-series

lstm

回答 1

Stack Overflow用户

发布于 2019-08-18 17:05:33

正如@isp-zax提到的，请提供一个reprex，以便我们可以重现结果并查看问题所在。

另外，您可以使用for col in df.columns而不是列出所有的列名，并且(细微的优化)第一个循环应该执行for x in range(len(df) - seq_length)，否则在结束时，您将多次执行循环seq_length - 1，而不实际处理任何数据。此外，df.values[a, b]不会包含索引b处的元素，因此如果您希望在X内的最后一行包含"window“，则end_id可以等于len(df)，即您可以为if end_id <= len(df):执行内部条件(准备和追加

除此之外，我认为如果您同时跨列和行对数据帧进行切片，而不使用one_data_point，即选择没有(最后)目标列的seq_length行，则阅读起来会更简单，只需执行以下操作：

df.values[start_id, end_id, :-1]

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/57438564

复制

相似问题

问LSTM多特征回归数据准备
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问LSTM多特征回归数据准备EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问LSTM多特征回归数据准备
EN