你能帮我自动计算我们客户在1,2,3种因素中所占份额的过程吗?
我有一个客户和特性的数据集。所有客户都有标签:
clients ours car house boat plane bike
client1 1 1 0 1 1 1
client2 0 0 0 1 1 1
client3 1 0 0 0 1 1
client4 1 1 0 1 1 1
client5 0 0 0 1 1 1
client6 1 0 0 0 1 1
clientN 0 0 1 0 1 1我想做三个实验:
factor_value 1 1 0 0计算我们所占份额(%)我们所占份额(%)汽车2 100% 2 40%房屋0 0% 467%船2 50% 2 67%飞机4 67% 0 0%自行车4 67% 0 0%
其中我们的份额=我们在要素价值中的份额。例如,汽车价值= 0。比我们的客户高出40%,因为5位客户没有车,其中有2位是我们的客户。
车+房车+船车+飞机车+自行车房+船屋+飞机屋+自行车船+飞机艇+平船+自行车
车+船车+船车+飞机车+飞机车+车+自行车车+船+飞机车+船+车+飞机+飞机+自行车
发布于 2020-10-06 07:24:22
以下是几个步骤(对三个因素进行分析):
#create container for 3 factors combinations
xgb3 = pd.DataFrame([('i', 'j', 'k')], columns = ['factor1', 'factor2', 'factor3'], index=[0])
#take the previous table with combinations of 2 factors: (res3)
for i in range(len(res3)) :
a = res3.iloc[i:i + 1,:]['factor1'].values[0]
b = res3.iloc[i:i + 1,:]['factor2'].values[0]
#add the third factor
for j in df.iloc[:,5:].columns.values:
#sort - to drop duplicates (e.g. a,c,b and a,b,c)
to_sort = sorted([a, b, j])
new_row3 = {'factor1':to_sort[0], 'factor2':to_sort[1], 'factor3':to_sort[2]}
xgb3 = xgb3.append(new_row3, ignore_index=True)
xgb3 = xgb3.drop_duplicates()
#additional drop of duplicates inside the row (e.g. a,a,b). All the items in the row must be unique
xgb3 = xgb3[(xgb3['factor1'] != xgb3['factor3']) &
(xgb3['factor2'] != xgb3['factor3']) &
(xgb3['factor1'] != xgb3['factor2'])].reset_index(drop=True)
#create container for results
result3 = pd.DataFrame([('','','',0,0,.1,0,0,.1)], columns = ['factor1','factor2','factor3','quantity_inside_combination', 'quantity_of_ours_inside_combination', '%_of_ours_inside_combination', 'quantity_outside_combination', 'quantity_of_ours_outside_combination', '%_of_ours_outside_combination'], index=[0])
for i in range(1, len(xgb3)): #range begins with 1 to skip the first row with technical information
f = xgb3.iloc[i:i+1,:]['factor1'].values[0]
s = xgb3.iloc[i:i+1,:]['factor2'].values[0]
x = xgb3.iloc[i:i+1,:]['factor3'].values[0]
m = df['stl'][(df[f] == 1) & (df[s] == 1) & (df[x] == 1)].count()
n = df['stl'][(df[f] == 1) & (df[s] == 1) & (df[x] == 1)].sum()
o = df['stl'][(df[f] == 0) & (df[s] == 0) & (df[x] == 0)].count()
p = df['stl'][(df[f] == 0) & (df[s] == 0) & (df[x] == 0)].sum()
new_row3 = {'factor1':f,
'factor2':s,
'factor3': x,
'quantity_inside_combination': m,
'quantity_of_ours_inside_combination': n,
'%_of_ours_inside_combination': 1. * n / m,
'quantity_outside_combination': o,
'quantity_of_ours_outside_combination': p,
'%_of_ours_outside_combination': 1. * p / o
}
result3 = result3.append(new_row3, ignore_index=True)https://stackoverflow.com/questions/64168718
复制相似问题