文章/答案/技术大牛

发布

社区首页 >问答首页 >从FeatureUnion +管道中获取功能名称

问从FeatureUnion +管道中获取功能名称
EN

Stack Overflow用户

提问于 2017-02-27 14:44:20

回答 2查看 11.2K关注 0票数 19

我正在使用FeatureUnion来连接从事件的标题和描述中找到的功能：

union = FeatureUnion(
    transformer_list=[
    # Pipeline for pulling features from the event's title
        ('title', Pipeline([
            ('selector', TextSelector(key='title')),
            ('count', CountVectorizer(stop_words='english')),
        ])),

        # Pipeline for standard bag-of-words model for description
        ('description', Pipeline([
            ('selector', TextSelector(key='description_snippet')),
            ('count', TfidfVectorizer(stop_words='english')),
        ])),
    ],

    transformer_weights ={
            'title': 1.0,
            'description': 0.2
    },
)

但是，调用union.get_feature_names()会给出一个错误："Transformer title (type Pipeline)不提供get_feature_names。“我想看看我的不同Vectorizers生成的一些特性。我该怎么做呢？

python-3.x

scikit-learn

nlp

feature-extraction

回答 2

Stack Overflow用户

发布于 2017-08-10 07:58:11

这是因为您正在使用一个名为TextSelector的自定义转换器。您是否在TextSelector中实现了get_feature_names？

如果你想让它工作，你必须在你的自定义转换中实现这个方法。

下面是一个具体的例子：

from sklearn.datasets import load_boston
from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn.base import TransformerMixin
import pandas as pd

dat = load_boston()
X = pd.DataFrame(dat['data'], columns=dat['feature_names'])
y = dat['target']

# define first custom transformer
class first_transform(TransformerMixin):
    def transform(self, df):
        return df

    def get_feature_names(self):
        return df.columns.tolist()


class second_transform(TransformerMixin):
    def transform(self, df):
        return df

    def get_feature_names(self):
        return df.columns.tolist()



pipe = Pipeline([
       ('features', FeatureUnion([
                    ('custom_transform_first', first_transform()),
                    ('custom_transform_second', second_transform())
                ])
        )])

>>> pipe.named_steps['features']_.get_feature_names()
['custom_transform_first__CRIM',
 'custom_transform_first__ZN',
 'custom_transform_first__INDUS',
 'custom_transform_first__CHAS',
 'custom_transform_first__NOX',
 'custom_transform_first__RM',
 'custom_transform_first__AGE',
 'custom_transform_first__DIS',
 'custom_transform_first__RAD',
 'custom_transform_first__TAX',
 'custom_transform_first__PTRATIO',
 'custom_transform_first__B',
 'custom_transform_first__LSTAT',
 'custom_transform_second__CRIM',
 'custom_transform_second__ZN',
 'custom_transform_second__INDUS',
 'custom_transform_second__CHAS',
 'custom_transform_second__NOX',
 'custom_transform_second__RM',
 'custom_transform_second__AGE',
 'custom_transform_second__DIS',
 'custom_transform_second__RAD',
 'custom_transform_second__TAX',
 'custom_transform_second__PTRATIO',
 'custom_transform_second__B',
 'custom_transform_second__LSTAT']

请记住，Feature Union将连接从每个转换器的相应get_feature_names发出的两个列表。这就是为什么当你的一个或多个转换器没有这个方法时，你会得到一个错误。

但是，我可以看到，这本身并不能解决您的问题，因为管道对象中没有get_feature_names方法，而您有嵌套的管道(特征联合中的管道)。所以你有两个选择：

Pipeline子类并自己添加get_feature_names方法，该方法从chain.

Extract中的最后一个转换器获取功能名称，您自己从每个转换器获取功能名称，这将要求您自己从管道中获取这些转换器，并对它们调用get_feature_names。

此外，请记住，许多sklearn内置的转换器并不在DataFrame上运行，而是传递numpy数组，所以如果您要将许多转换器链接在一起，请小心它。但我认为这给了你足够的信息，让你对正在发生的事情有一个概念。

还有一件事，看看sklearn-pandas。我自己还没有用过它，但它可能会为你提供一个解决方案。

票数 12

Stack Overflow用户

发布于 2019-03-13 01:05:24

你可以这样调用不同的Vectorizers作为一个嵌套的特性(感谢edesz)：

pipevect= dict(pipeline.named_steps['union'].transformer_list).get('title').named_steps['count']

然后让TfidfVectorizer()实例传入另一个函数：

Show_most_informative_features(pipevect,
       pipeline.named_steps['classifier'], n=MostIF)
# or direct   
print(pipevect.get_feature_names())

票数 4

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/42479370

复制

相似问题

问从FeatureUnion +管道中获取功能名称
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从FeatureUnion +管道中获取功能名称EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从FeatureUnion +管道中获取功能名称
EN