文章/答案/技术大牛

发布

社区首页 >问答首页 >如何将深度特征合成应用于单个表

问如何将深度特征合成应用于单个表
EN

Stack Overflow用户

提问于 2018-05-03 10:12:13

回答 1查看 2.3K关注 0票数 10

经过处理后，我的数据是一个表，其中有几列是要素，一列是标签。我想使用featuretools.dfs来帮助我预测标签。是否可以直接执行此操作，或者我是否需要将单个表拆分为多个表？

featuretools

回答 1

Stack Overflow用户

发布于 2018-05-08 22:50:30

可以在单个表上运行DFS。举个例子，如果你有一个带有索引'index'的pandas dataframe df，你可以这样写：

import featuretools as ft
es = ft.EntitySet('Transactions')

es.entity_from_dataframe(dataframe=df,
                         entity_id='log',
                         index='index')

fm, features = ft.dfs(entityset=es, 
                      target_entity='log',
                      trans_primitives=['day', 'weekday', 'month'])

生成的特征矩阵将如下所示

In [1]: fm
Out[1]: 
             location  pies sold  WEEKDAY(date)  MONTH(date)  DAY(date)
index                                                                  
1         main street          3              4           12         29
2         main street          4              5           12         30
3         main street          5              6           12         31
4      arlington ave.         18              0            1          1
5      arlington ave.          1              1            1          2

这将对您的数据应用“转换”原语。您通常希望添加更多的实体来提供ft.dfs，以便使用聚合原语。您可以在我们的documentation中了解到不同之处。

一个标准的工作流程是通过一个有趣的分类来normalize你的单个实体。如果您的df是单个表

| index | location       | pies sold |   date |
|-------+----------------+-------+------------|
|     1 | main street    |     3 | 2017-12-29 |
|     2 | main street    |     4 | 2017-12-30 |
|     3 | main street    |     5 | 2017-12-31 |
|     4 | arlington ave. |    18 | 2018-01-01 |
|     5 | arlington ave. |     1 | 2018-01-02 |

您可能会对使用location进行规范化感兴趣

es.normalize_entity(base_entity_id='log',
                    new_entity_id='locations',
                    index='location')

您的新实体locations将包含该表

| location       | first_log_time |
|----------------+----------------|
| main street    |     2018-12-29 |
| arlington ave. |     2000-01-01 |

这使得像locations.SUM(log.pies sold)或locations.MEAN(log.pies sold)这样的功能可以根据位置对所有值进行相加或平均。您可以在以下示例中看到创建的这些要素

In [1]: import pandas as pd
   ...: import featuretools as ft
   ...: df = pd.DataFrame({'index': [1, 2, 3, 4, 5],
   ...:                    'location': ['main street',
   ...:                                 'main street',
   ...:                                 'main street',
   ...:                                 'arlington ave.',
   ...:                                 'arlington ave.'],
   ...:                    'pies sold': [3, 4, 5, 18, 1]})
   ...: df['date'] = pd.date_range('12/29/2017', periods=5, freq='D')
   ...: df
   ...: 

Out[1]: 
   index        location  pies sold       date
0      1     main street          3 2017-12-29
1      2     main street          4 2017-12-30
2      3     main street          5 2017-12-31
3      4  arlington ave.         18 2018-01-01
4      5  arlington ave.          1 2018-01-02

In [2]: es = ft.EntitySet('Transactions')
   ...: es.entity_from_dataframe(dataframe=df, entity_id='log', index='index', t
   ...: ime_index='date')
   ...: es.normalize_entity(base_entity_id='log', new_entity_id='locations', ind
   ...: ex='location')
   ...: 
Out[2]: 
Entityset: Transactions
  Entities:
    log [Rows: 5, Columns: 4]
    locations [Rows: 2, Columns: 2]
  Relationships:
    log.location -> locations.location

In [3]: fm, features = ft.dfs(entityset=es,
   ...:                       target_entity='log',
   ...:                       agg_primitives=['sum', 'mean'],
   ...:                       trans_primitives=['day'])
   ...: fm
   ...: 
Out[3]: 
             location  pies sold  DAY(date)  locations.DAY(first_log_time)  locations.MEAN(log.pies sold)  locations.SUM(log.pies sold)
index                                                                                                                                  
1         main street          3         29                             29                            4.0                            12
2         main street          4         30                             29                            4.0                            12
3         main street          5         31                             29                            4.0                            12
4      arlington ave.         18          1                              1                            9.5                            19
5      arlington ave.          1          2                              1                            9.5                            19

票数 14

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50145953

复制

相似问题

问如何将深度特征合成应用于单个表
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何将深度特征合成应用于单个表EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何将深度特征合成应用于单个表
EN