给出了来自scikit learn examples的示例,使用如下所示的管道的特性联合。如何在流水线执行后获得整个特征矩阵的尺寸?
pipeline = Pipeline([
# Extract the subject & body
('subjectbody', SubjectBodyExtractor()),
# Use FeatureUnion to combine the features from subject and body
('union', FeatureUnion(
transformer_list=[
# Pipeline for pulling features from the post's subject line
('subject', Pipeline([
('selector', ItemSelector(key='subject')),
('tfidf', TfidfVectorizer(min_df=50)),
])),
# Pipeline for standard bag-of-words model for body
('body_bow', Pipeline([
('selector', ItemSelector(key='body')),
('tfidf', TfidfVectorizer()),
('best', TruncatedSVD(n_components=50)),
])),
# Pipeline for pulling ad hoc features from post's body
('body_stats', Pipeline([
('selector', ItemSelector(key='body')),
('stats', TextStats()), # returns a list of dicts
('vect', DictVectorizer()), # list of dicts -> feature matrix
])),
],
# weight components in FeatureUnion
transformer_weights={
'subject': 0.8,
'body_bow': 0.5,
'body_stats': 1.0,
},
)),
# Use a SVC classifier on the combined features
('svc', SVC(kernel='linear')),
])发布于 2018-07-10 16:41:33
FeatureUnion只会更改数据的列,因此行数保持不变。
现在,要获得管道执行后的列数,有多种方法:
1)您当前的管道将SVC作为最后的估计器。这不会改变数据的形状,只适合数据。因此,您可以使用它的属性来获取上一步输入到它的特征的数量。
根据documentation,您可以使用:
support_vectors_:类似数组,形状= n_SV,n_features
第二个维度将表示输入到SVC的n_features。您可以通过以下方式访问:
pipeline.named_steps['svc'].support_vectors_.shape2) (更简单)您可以复制管道(保留最后一步(svc)),然后对其调用fit_transform()。
pipeline = Pipeline([
# Extract the subject & body
('subjectbody', SubjectBodyExtractor()),
# Use FeatureUnion to combine the features from subject and body
('union', FeatureUnion(
transformer_list=[
# Pipeline for pulling features from the post's subject line
('subject', Pipeline([
('selector', ItemSelector(key='subject')),
('tfidf', TfidfVectorizer(min_df=50)),
])),
# Pipeline for standard bag-of-words model for body
('body_bow', Pipeline([
('selector', ItemSelector(key='body')),
('tfidf', TfidfVectorizer()),
('best', TruncatedSVD(n_components=50)),
])),
# Pipeline for pulling ad hoc features from post's body
('body_stats', Pipeline([
('selector', ItemSelector(key='body')),
('stats', TextStats()), # returns a list of dicts
('vect', DictVectorizer()), # list of dicts -> feature matrix
])),
],
# weight components in FeatureUnion
transformer_weights={
'subject': 0.8,
'body_bow': 0.5,
'body_stats': 1.0,
},
)),
])然后,
X_transformed = pipeline.fit_transform(X)
print(X_transformed.shape)https://stackoverflow.com/questions/51258466
复制相似问题