首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何编写一个函数来查找已经消失的客户端、回旋镖、新的等等?

如何编写一个函数来查找已经消失的客户端、回旋镖、新的等等?
EN

Stack Overflow用户
提问于 2022-09-06 14:05:56
回答 1查看 78关注 0票数 1

我试图想出一种动态的方法来检查字符串的存在,并报告几个不同的结果: gone_client、回旋镖、new_client。

  1. ,如果我是address_id和my_date,模式是威瑞森,威瑞森,康卡斯特,康卡斯特,客户离开威瑞森,去了另一家公司。

  1. 如果客户从威瑞森到康卡斯特,然后再回到威瑞森,这是一个回旋镖。

  1. --如果上个月威瑞森的客户是新客户,这是一个新客户端,但如果客户端在过去和上个月的某个时间点使用Verizon,那么这是一个回头客。

将熊猫作为pd导入numpy作为npdata存储在字典details ={ 'address_id':111,111,111,111,111,111,222,222,222,222,222,222,333,333,333,333,333,333,444,444,444,444,444,444,555,555,555,555,555,555,777,777,777,'my_company':'Comcast','Verizon','Other','Other','Comcast','Comcast',‘频谱’,'Verizon',‘频谱’,“频谱”、“威瑞森”、“频谱”、“威瑞森”、“其他”、“威瑞森”、“康卡斯特”、“康卡斯特”、“我的约会”:“2022-01-24”、“2022-02-21”、“2022-03-28”、“2022-04-25”、“2022-05-23”、“2022-06-27”,“2022-01-24”、“2022-02-21”、“2022-03-28”、“2022-04-25”、“2022-05-23”、“2022-06-27”、“2022-01-24”、“2022-02-21”、“2022-03-28”、“2022-04-25”、“2022-05-23”、“2022-06-27”、“2022-01-24”、“2022-02-21”,“2022-03-28”、“2022-04-25”、“2022-05-23”、“2022-06-27”、“2022-01-24”、“2022-02-21”、“2022-03-28”、“2022-04-25”、“2022-05-23”、“2022-06-27”、“2022-01-24”、“2022-02-21”、“2022-03-28”}=pd.DataFrame(详细信息) df

我不能完全正确地理解逻辑,但我认为这是一些关于lines...but的东西

代码语言:javascript
复制
def f(ser):
    if 'Verizon' not in ser.unique():
        return False
    if 'Verizon' in ser.unique():
        return True
    return False

df['gone_client'] = df.groupby('address_id')['my_company'].transform(f)
df.head()

也许是这样的..。

代码语言:javascript
复制
# substring to be searched
sub ='Verizon'
 
# creating and passing series to new column
df["Indexes"]= df["my_company"].str.find(sub)
df

不过,我不知道如何处理这些日期,只有通过检查日期,您才能知道客户端是否已经离开、翻回、新的,或者可能是一些不适合于水桶的奇怪模式,比如“其他”。预期的结果将是这样(最后一种情况是一种又一次离开的回旋镖)。我不会期望有很多这样的场景,但是肯定会有一些非常独特的情况,所以“其他”对于这些记录来说是可以的。

代码语言:javascript
复制
    address_id      my_company  my_date     status
0   111             Comcast     1/24/2022   Left
1   111             Verizon     2/21/2022   Left
2   111             Other       3/28/2022   Left
3   111             Other       4/25/2022   Left
4   111             Comcast     5/23/2022   Left
5   111             Comcast     6/27/2022   Left
6   222             Spectrum    1/24/2022   Never Client
7   222             Spectrum    2/21/2022   Never Client
8   222             Spectrum    3/28/2022   Never Client
9   222             Spectrum    4/25/2022   Never Client
10  222             Spectrum    5/23/2022   Never Client
11  222             Spectrum    6/27/2022   Never Client
12  333             Verizon     1/24/2022   Never Left
13  333             Verizon     2/21/2022   Never Left
14  333             Verizon     3/28/2022   Never Left
15  333             Verizon     4/25/2022   Never Left
16  333             Verizon     5/23/2022   Never Left
17  333             Verizon     6/27/2022   Never Left
18  444             Spectrum    1/24/2022   Left
19  444             Spectrum    2/21/2022   Left
20  444             Spectrum    3/28/2022   Left
21  444             Spectrum    4/25/2022   Left
22  444             Verizon     5/23/2022   Left
23  444             Spectrum    6/27/2022   Left
24  555             Verizon     1/24/2022   Boomerang
25  555             Spectrum    2/21/2022   Boomerang
26  555             Spectrum    3/28/2022   Boomerang
27  555             Spectrum    4/25/2022   Boomerang
28  555             Verizon     5/23/2022   Boomerang
29  555             Verizon     6/27/2022   Boomerang
30  777             Comcast     1/24/2022   New Client
31  777             Comcast     2/21/2022   New Client
32  777             Verizon     1/24/2022   New Client
33  888             Verizon     2/21/2022   Other
34  888             Comcast     3/28/2022   Other
35  888             Verizon     4/25/2022   Other
36  888             Comcast     5/23/2022   Other
37  888             Comcast     6/27/2022   Other
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-09-06 21:16:18

这里有一个包含一些详细逻辑的解决方案,您可以随意使用。这听起来不像是你对你最后的逻辑很确定,但希望这能给你足够的时间去玩。

此解决方案基于address_id对数据进行分组。然后,对于每个单独的组,我们可以检查哪个my_company特性。我们可以利用这一点,再加上一个timedelta,来说明地址是否在Verizon,而不是Verizon,永远不属于Verizon,现在是Verizon,在过去的30天里又回来了等等。

这个答案不是由Verizon赞助的。其他手机提供商也存在。

代码语言:javascript
复制
import pandas as pd
import datetime

# data stored in dictionary
details = {
    'address_id': [111,111,111,111,111,111,222,222,222,222,222,222,333,333,333,333,333,333,444,444,444,444,444,444,555,555,555,555,555,555,777,777,777],
    'my_company':['Comcast','Verizon','Other','Other','Comcast','Comcast','Spectrum','Spectrum','Spectrum','Spectrum','Spectrum','Spectrum','Verizon','Verizon','Verizon','Verizon','Verizon','Verizon','Spectrum','Spectrum','Spectrum','Spectrum','Verizon','Spectrum','Spectrum','Spectrum','Spectrum','Spectrum','Verizon','Other','Verizon','Comcast','Comcast'],
    'my_date':['2022-01-24','2022-02-21','2022-03-28','2022-04-25','2022-05-23','2022-06-27','2022-01-24','2022-02-21','2022-03-28','2022-04-25','2022-05-23','2022-06-27','2022-01-24','2022-02-21','2022-03-28','2022-04-25','2022-05-23','2022-06-27','2022-01-24','2022-02-21','2022-03-28','2022-04-25','2022-05-23','2022-06-27','2022-01-24','2022-02-21','2022-03-28','2022-04-25','2022-05-23','2022-06-27','2022-01-24','2022-02-21','2022-03-28']
}

df = pd.DataFrame(details)

df['my_date'] = pd.to_datetime(df['my_date']) 

address_groups = df.groupby(['address_id'])

frame_list = []

current_date = datetime.datetime.now()

for group, frame in address_groups:

    # Create a list and set of each company used by a given address-id:
    company_list = frame['my_company'].values.tolist()
    company_set = set(company_list)

    # Exclusively Verizon
    if ('Verizon' in company_set) and (len(company_set) == 1):
        frame['status'] = 'Verizon Diehard'

    # Never Verizon
    if ('Verizon' not in company_set):
        frame['status'] = 'Verizon Never'

    # Verizon at some point but not currently
    if ('Verizon' in company_set) and (company_list[-1] != 'Verizon'):

        v_frame = frame[frame['my_company'] == 'Verizon']
        last_verizon_date = v_frame['my_date'].iloc[-1]
        last_verizon_date = datetime.datetime.strptime(last_verizon_date, '%Y-%m-%d')
        if (current_date - last_verizon_date) < pd.Timedelta("30 days"):
            frame['status'] = 'Not curretly Verizon, but was in last 30 days'
        else:
            frame['status'] = 'Not curretly Verizon, but was so more than 30 days ago'
 
    # Verizon currently but was a boomerang
    if (company_list[-1] == 'Verizon') and (len(company_set) >= 2):

        non_v_frame = frame[frame['my_company'] != 'Verizon']
        last_non_v_date = non_v_frame['my_date'].iloc[-1]
        last_non_v_date = datetime.datetime.strptime(last_non_v_date, '%Y-%m-%d')

        if (current_date - last_non_v_date) < pd.Timedelta("30 days"):
            frame['status'] = 'Boomerang back to Verizon in last 30 days'
        else:
            frame['status'] = 'Boomerang back more than 30 days ago'


    frame_list.append(frame)

final_df = pd.concat(frame_list)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73623406

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档