文章/答案/技术大牛

发布

社区首页 >问答首页 >python中虚拟变量之间的交互作用

问python中虚拟变量之间的交互作用
EN

Stack Overflow用户

提问于 2017-03-23 07:19:27

回答 3查看 4.1K关注 0票数 2

我正在尝试理解如何在使用get_dummies之后寻址列。例如，假设我有三个分类变量。第一个变量有两个层次。第二个变量有5个层次。第三变量有两个层次。

df=pd.DataFrame({"a":["Yes","Yes","No","No","No","Yes","Yes"], "b":["a","b","c","d","e","a","c"],"c":["1","2","2","1","2","1","1"]})

我为所有三个变量创建了虚拟模型，以便在python中的sklearn回归中使用它们。

df1 = pd.get_dummies(df,drop_first=True)

现在我想创建两个相互作用(乘法)：b_c，b_a

如何在每个虚拟变量之间创建乘法，而不使用它们的特定名称：

df1['a_yes_b'] = df1['a_Yes']*df1['b_b']
df1['a_yes_c'] = df1['a_Yes']*df1['b_c']
df1['a_yes_d'] = df1['a_Yes']*df1['b_d']
df1['a_yes_e'] = df1['a_Yes']*df1['b_e']

df1['c_2_b'] = df1['c_2']*df1['b_b']
df1['c_2_c'] = df1['c_2']*df1['b_c']
df1['c_2_d'] = df1['c_2']*df1['b_d']
df1['c_2_e'] = df1['c_2']*df1['b_e']

谢谢。

python

pandas

data-science

回答 3

Stack Overflow用户

回答已采纳

发布于 2017-03-23 07:45:15

您可以使用循环创建新列，可以使用boolean indexing和str.startswith对列名进行过滤。

a = df1.columns[df1.columns.str.startswith('a')]
b = df1.columns[df1.columns.str.startswith('b')]
c = df1.columns[df1.columns.str.startswith('c')]

for col1 in b:
    for col2 in a:
        df1[col2 + '_' + col1.split('_')[1]] = df1[col1].mul(df1[col2])

for col1 in b:
    for col2 in c:
        df1[col2 + '_' + col1.split('_')[1]] = df1[col1].mul(df1[col2])
print (df1)

   a_Yes  b_b  b_c  b_d  b_e  c_2  a_Yes_b  a_Yes_c  a_Yes_d  a_Yes_e  c_2_b  \
0      1    0    0    0    0    0        0        0        0        0      0   
1      1    1    0    0    0    1        1        0        0        0      1   
2      0    0    1    0    0    1        0        0        0        0      0   
3      0    0    0    1    0    0        0        0        0        0      0   
4      0    0    0    0    1    1        0        0        0        0      0   
5      1    0    0    0    0    0        0        0        0        0      0   
6      1    0    1    0    0    0        0        1        0        0      0   

   c_2_c  c_2_d  c_2_e  
0      0      0      0  
1      0      0      0  
2      1      0      0  
3      0      0      0  
4      0      0      1  
5      0      0      0  
6      0      0      0

但是，如果a和b只有一个列(在示例中是的，可能在实际数据中)，则使用：filter、mul、squeeze和concat

a = df1.filter(regex='^a')
b = df1.filter(regex='^b')
c = df1.filter(regex='^c')

dfa = b.mul(a.squeeze(), axis=0).rename(columns=lambda x: a.columns[0] + x[1:])
dfc = b.mul(c.squeeze(), axis=0).rename(columns=lambda x: c.columns[0] + x[1:])

df1 = pd.concat([df1, dfa, dfc], axis=1)
print (df1)
   a_Yes  b_b  b_c  b_d  b_e  c_2  a_Yes_b  a_Yes_c  a_Yes_d  a_Yes_e  c_2_b  \
0      1    0    0    0    0    0        0        0        0        0      0   
1      1    1    0    0    0    1        1        0        0        0      1   
2      0    0    1    0    0    1        0        0        0        0      0   
3      0    0    0    1    0    0        0        0        0        0      0   
4      0    0    0    0    1    1        0        0        0        0      0   
5      1    0    0    0    0    0        0        0        0        0      0   
6      1    0    1    0    0    0        0        1        0        0      0   

   c_2_c  c_2_d  c_2_e  
0      0      0      0  
1      0      0      0  
2      1      0      0  
3      0      0      0  
4      0      0      1  
5      0      0      0  
6      0      0      0

票数 3

Stack Overflow用户

发布于 2017-03-23 07:49:03

您可以将dataframe列转换为numpy数组，然后相应地对其进行乘。这里有一个链接，您可以在这里找到方法来实现这一点：

Convert Select Columns in Pandas Dataframe to Numpy Array

票数 0

Stack Overflow用户

发布于 2021-02-21 21:51:44

这解决了您的问题：

def get_design_with_pair_interaction(data, group_pair):
    """ Get the design matrix with the pairwise interactions
    
    Parameters
    ----------
    data (pandas.DataFrame):
       Pandas data frame with the two variables to build the design matrix of their two main effects and their interaction
    group_pair (iterator):
       List with the name of the two variables (name of the columns) to build the design matrix of their two main effects and their interaction
    
    Returns
    -------
    x_new (pandas.DataFrame):
       Pandas data frame with the design matrix of their two main effects and their interaction
    
    """
    x = pd.get_dummies(data[group_pair])
    interactions_lst = list(
        itertools.combinations(
            x.columns.tolist(),
            2,
        ),
    ) 
    x_new = x.copy()
    for level_1, level_2 in interactions_lst:
        if level_1.split('_')[0] == level_2.split('_')[0]:
            continue
        x_new = pd.concat(
            [
                x_new,
                x[level_1] * x[level_2]
            ],
            axis=1,
        )
        x_new = x_new.rename(
            columns = {
                0: (level_1 + '_' + level_2)
            }
        )
    return x_new

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/42969545

复制

相似问题

问python中虚拟变量之间的交互作用
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问python中虚拟变量之间的交互作用EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问python中虚拟变量之间的交互作用
EN