sktime学习分类
from sklearn.model_selection import train_test_split
X = AUDCHF_h1_model[['Open','High','Low','Close','Volume','VWMA',
'Minute','Hour','Day','Week','Month','Year']].values
y = AUDCHF_h1_model[['is_beg_leg']].values
X_train,X_test,y_train,y_test = train_test_split(
X, y, test_size=0.2)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)(53250,12) (53250,1) (13313,12) (13313,1)
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sktime.classification.compose import ColumnEnsembleClassifier
from sktime.classification.dictionary_based import BOSSEnsemble
from sktime.classification.interval_based import TimeSeriesForestClassifier
#from sktime.classification.shapelet_based import MrSEQLClassifier
from sktime.datasets import load_basic_motions
from sktime.transformations.panel.compose import ColumnConcatenator
steps = [
("concatenate", ColumnConcatenator()),
("classify", TimeSeriesForestClassifier(n_estimators=100)),
]
clf = Pipeline(steps)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)我收到
ValueError:病例数不匹配。X中的数目= 639000,y= 53250
但
X_train.shape (53250,12) y_train.shape (53250,1)
谁知道呢?
发布于 2022-09-13 19:34:59
根据您提供的信息,我不能肯定地说什么,但我怀疑问题是管道中的ColumnConcatenator,它堆叠了X的所有列,以创建一个新的单变量时间序列( 53250 * 12 = 639000行)。然后将这个串联的时间序列传递给TimeSeriesForestClassifier,并具有与原始输入不同的形状。根据您的用例,您现在可以删除“串联”步骤,也可以为新创建的单变量时间序列提供目标值。
https://stackoverflow.com/questions/73475273
复制相似问题