首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何与每个因素进行比较( 1,2,3因素的组合)

如何与每个因素进行比较( 1,2,3因素的组合)
EN

Stack Overflow用户
提问于 2020-10-02 08:51:46
回答 1查看 28关注 0票数 0

你能帮我自动计算我们客户在1,2,3种因素中所占份额的过程吗?

我有一个客户和特性的数据集。所有客户都有标签:

  • 1 -“我们的”
  • 0-“not_ours”

代码语言:javascript
复制
clients ours car        house         boat     plane         bike
client1 1     1         0             1         1             1
client2 0     0         0             1         1             1
client3 1     0         0             0         1             1
client4 1     1         0             1         1             1
client5 0     0         0             1         1             1
client6 1     0         0             0         1             1
clientN 0     0         1             0         1             1

我想做三个实验:

  1. ,以了解我们在每个1因子值内的数量份额。理想结果:

factor_value 1 1 0 0计算我们所占份额(%)我们所占份额(%)汽车2 100% 2 40%房屋0 0% 467%船2 50% 2 67%飞机4 67% 0 0%自行车4 67% 0 0%

其中我们的份额=我们在要素价值中的份额。例如,汽车价值= 0。比我们的客户高出40%,因为5位客户没有车,其中有2位是我们的客户。

  1. 相同的计算,但检查每个因素的组合两个因素:

车+房车+船车+飞机车+自行车房+船屋+飞机屋+自行车船+飞机艇+平船+自行车

  1. 考虑了三个因素的所有可能组合:

车+船车+船车+飞机车+飞机车+车+自行车车+船+飞机车+船+车+飞机+飞机+自行车

EN

回答 1

Stack Overflow用户

发布于 2020-10-06 07:24:22

以下是几个步骤(对三个因素进行分析):

代码语言:javascript
复制
#create container for 3 factors combinations
xgb3 = pd.DataFrame([('i', 'j', 'k')], columns = ['factor1', 'factor2', 'factor3'], index=[0])

#take the previous  table with combinations of 2 factors: (res3)
for i in range(len(res3)) :
    a = res3.iloc[i:i + 1,:]['factor1'].values[0]
    b = res3.iloc[i:i + 1,:]['factor2'].values[0]
#add the third factor
    for j in df.iloc[:,5:].columns.values:
#sort - to drop duplicates (e.g. a,c,b and a,b,c)
        to_sort = sorted([a, b, j])
        new_row3 = {'factor1':to_sort[0], 'factor2':to_sort[1], 'factor3':to_sort[2]}
        xgb3 = xgb3.append(new_row3, ignore_index=True)
        
xgb3 = xgb3.drop_duplicates()
#additional drop of duplicates inside the row (e.g. a,a,b). All the items in the row must be unique
xgb3 = xgb3[(xgb3['factor1'] != xgb3['factor3']) & 
            (xgb3['factor2'] != xgb3['factor3']) & 
            (xgb3['factor1'] != xgb3['factor2'])].reset_index(drop=True)


#create container for results
result3 = pd.DataFrame([('','','',0,0,.1,0,0,.1)], columns = ['factor1','factor2','factor3','quantity_inside_combination', 'quantity_of_ours_inside_combination', '%_of_ours_inside_combination', 'quantity_outside_combination', 'quantity_of_ours_outside_combination', '%_of_ours_outside_combination'], index=[0])

for i in range(1, len(xgb3)): #range begins with 1 to skip the first row with technical information
    f = xgb3.iloc[i:i+1,:]['factor1'].values[0]
    s = xgb3.iloc[i:i+1,:]['factor2'].values[0]
    x = xgb3.iloc[i:i+1,:]['factor3'].values[0]
    
    m = df['stl'][(df[f] == 1) & (df[s] == 1) & (df[x] == 1)].count()
    n = df['stl'][(df[f] == 1) & (df[s] == 1) & (df[x] == 1)].sum()
    o = df['stl'][(df[f] == 0) & (df[s] == 0) & (df[x] == 0)].count()
    p = df['stl'][(df[f] == 0) & (df[s] == 0) & (df[x] == 0)].sum()
    
    new_row3 = {'factor1':f, 
               'factor2':s,
                'factor3': x,
               'quantity_inside_combination': m,
               'quantity_of_ours_inside_combination': n,
               '%_of_ours_inside_combination': 1. * n / m,
               'quantity_outside_combination': o,
               'quantity_of_ours_outside_combination': p,
               '%_of_ours_outside_combination': 1. * p / o
              }
    result3 = result3.append(new_row3, ignore_index=True)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/64168718

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档