文章/答案/技术大牛

发布

社区首页 >问答首页 >在计算忠诚度后，在数据中添加新客户

问在计算忠诚度后，在数据中添加新客户
EN

Stack Overflow用户

提问于 2022-03-25 13:39:44

回答 2查看 42关注 0票数 0

我有一个数据集df，其中包括关于老客户的信息和可能是新的或旧的客户列表。我想给新客户分配一个忠诚度值，即那些还没有在dataset df中评估的客户。请看下面我想要达到的目标的一个例子。假设我有一个潜在的新客户列表(例如，customers=['cust1', 'cust76'])，他们的忠诚度信息可能在df数据集中缺失(这里称为existing_cust_df)。

existing_cust_df=pd.read(path.csv)

数据集只有两个字段:一个是客户的ID，另一个是忠诚度值(称为value)。

Customer         value 
cust2             13
cust3             14
cust6             35
cust7             21
cust1             24

我想确定新客户的忠诚度(在这个例子中，只有cust76，因为cust1已经在数据集中了，我们有关于它的价值的信息--即。24)基于包含在existing_cust_df数据集中的现有客户的价值。

为了确定新客户的价值，我使用了一个不同的数据集，名为extra_df，其中包括新客户和可能用于确定其忠诚度价值的潜在客户列表。extra_df数据集如下所示：

   Customer         Related_customer
    cust76             cust32
    cust76             cust2
    cust76             cust1

然后，可以使用extra_df查看可能包含在existing_cust_df中的相关客户，以便从那里获取他们的值，并通过平均这些值分配给新客户。在我提供的示例中，cust76在existing_cust_df中缺失，所以我查看extra_df，在与它相关的三个客户中，只有两个(cust2和cust1)在existing_cust_df中，因此它们的值。通过平均它们的值(13和24)，我希望将这个值(18.5)分配给新客户(cust76)，并将其ID (cust76)和计算值(18.5)附加到existing_cust_df中。所以在最后我会：

Customer         value 
cust2             13
cust3             14
cust6             35
cust7             21
cust1             24
cust76            18.5

我希望上面的步骤是有意义的，如果没有，请告诉我。请告诉我如何运行此检查，查看列表中的客户进入existing_cust_df，然后在extra_df中(如果不包括)计算其平均值并最终更新existing_cust_df中的值？我知道要在现有的dataframe中追加一行，我应该这样做：

new_row = {'Customer': cust76, 'value': 18.5}
#append row to the dataframe
existing_cust_df = existing_cust_df.append(new_row, ignore_index=False)

但在这种情况下，困难在于查看不在existing_cust_df中的列表中的客户，然后，在extra_df中，为了包括他们和他们的值，一旦计算出平均值(否则，分配一个空值也会更好)。

任何帮助都会很好。

pandas

python

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-03-25 14:34:44

让我们首先构建我们的数据框架：

csvfile = StringIO(
"""Customer\tvalue
cust2\t13
cust3\t14
cust6\t35
cust7\t21
cust1\t24""")
existing_cust_df = pd.read_csv(csvfile, sep = '\t', engine='python')

csvfile = StringIO("""
Customer\tRelated_customer
cust76\tcust32
cust76\tcust2
cust76\tcust1""")
extra_df = pd.read_csv(csvfile, sep = '\t', engine='python')

我们希望为extra_df中的每个输入循环，让我们通过：

dg = extra_df.groupby('Customer')

然后，我们可以在extra_cust_df：True中为每个客户构建一个掩码，如果related_customer在existing_cust_df中。

dg.Related_customer.apply(lambda x : existing_cust_df.Customer.isin(x.to_numpy()))
Out[123]: 
Customer   
cust76    0     True
          1    False
          2    False
          3    False
          4     True
Name: Related_customer, dtype: bool

然后，我们可以在existing_cust_df.value上应用这个掩码，然后将其平均化：

dg.Related_customer.apply(lambda x : existing_cust_df.query("Customer.isin(@x.to_numpy())"))

Out[125]: 
           Customer  value
Customer                  
cust76   0    cust2     13
         4    cust1     24


dg.Related_customer.apply(lambda x : existing_cust_df.query("Customer.isin(@x.to_numpy())").value.mean())

Out[126]: 
Customer
cust76    18.5
Name: Related_customer, dtype: float64

最后，我们可以将其附加到您现有的df中。

票数 1

Stack Overflow用户

发布于 2022-03-25 14:51:13

这里有一个方法来做你的问题所要求的：

import pandas as pd
import numpy as np
# Objective:
#    for each customer in customers not already in existing_cust_df, 
#    add a record to existing_cust_df with value equal to 
#    the average of the values from existing_cust_df 
#    of Related_customer records in extra_df 
#    where extra_df.Customer == customer and Related_customer is in existing_cust_df.

existing_cust_df = pd.DataFrame({'Customer':['cust2','cust3','cust6','cust7','cust1'], 'value':[13,14,35,21,24]})
print("existing_cust_df:")
print(existing_cust_df)

extra_df = pd.DataFrame({'Customer':['cust76','cust76','cust76', 'cust77', 'cust77'], 'Related_customer':['cust32','cust2','cust1', 'cust2', 'cust6']})
print("extra_df:")
print(extra_df)

customers=['cust1', 'cust76', 'cust77']
new_cust_df = pd.DataFrame({'Customer': [c for c in customers if c not in set(existing_cust_df['Customer'])]})
print("new_cust_df:")
print(new_cust_df)

# create a record for each new_cust containing list of related_cust names 
#    with matching customer in extra_df where related_cust is a customer in existing_cust_df
existing_related_customer_values = \
    pd.merge( \
        pd.merge(extra_df, new_cust_df, on= 'Customer'), \
        existing_cust_df, left_on='Related_customer', right_on='Customer', suffixes=(None, '_y')) \
    [['Customer', 'Related_customer', 'value']]
print("existing_related_customer_values\n (JOIN from new_cust_df and extra_df on Customer [KEEP],\n then JOIN on extra_df.Related_customer and existing_cust_df.Customer [KEEP value])")
print(existing_related_customer_values)

# find the mean of values for existing Related_customer records for new customers
new_cust_df['value'] = new_cust_df.apply(lambda x: np.mean( \
    existing_related_customer_values[existing_related_customer_values['Customer'] == x['Customer']]['value']), axis=1)
print("new_cust_df (UPDATED with value column):")
print(new_cust_df)

existing_cust_df = pd.concat([existing_cust_df, new_cust_df], ignore_index=True)
print("existing_cust_df (UPDATED with new customer rows):")
print(existing_cust_df)

输出：

existing_cust_df:
  Customer  value
0    cust2     13
1    cust3     14
2    cust6     35
3    cust7     21
4    cust1     24
extra_df:
  Customer Related_customer
0   cust76           cust32
1   cust76            cust2
2   cust76            cust1
3   cust77            cust2
4   cust77            cust6
new_cust_df:
  Customer
0   cust76
1   cust77
existing_related_customer_values
 (JOIN from new_cust_df and extra_df on Customer [KEEP],
 then JOIN on extra_df.Related_customer and existing_cust_df.Customer [KEEP value])
  Customer Related_customer  value
0   cust76            cust2     13
1   cust77            cust2     13
2   cust76            cust1     24
3   cust77            cust6     35
new_cust_df (UPDATED with value column):
  Customer  value
0   cust76   18.5
1   cust77   24.0
existing_cust_df (UPDATED with new customer rows):
  Customer  value
0    cust2   13.0
1    cust3   14.0
2    cust6   35.0
3    cust7   21.0
4    cust1   24.0
5   cust76   18.5
6   cust77   24.0

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71617891

复制

相似问题

问在计算忠诚度后，在数据中添加新客户
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在计算忠诚度后，在数据中添加新客户EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在计算忠诚度后，在数据中添加新客户
EN