首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >跟踪group by group中的更改

跟踪group by group中的更改
EN

Stack Overflow用户
提问于 2018-12-04 19:29:26
回答 1查看 48关注 0票数 0

我有下面的数据框架,它为每个人指明了他们已经连接到哪个项目的雇主(并且他们可以连接到一个项目的多个雇主)。列年份表示年份(从数字开始表示项目在一年中的顺序(项目20122在项目20121之后执行))

代码语言:javascript
复制
Employee_id = [7102825752, 7102825752, 7102825752, 7102825752, 7102825752, 7102825752, 7102825752, 7102825752, 7102825752, 7102825752]
Project_id = [28253288, 28648301, 28800042, 29113983, 29126250, 29364924, 29678870, 29691896, 29691235, 29691235]
Employer_id = [60031437, 60031437, 60033114, 115272656, 110625857, 60031437, 60031437, 60031437, 61273455, 85972742]
Year = [20121, 20122, 20131, 20141, 20151, 20152, 20161, 20161, 20162, 20162]

import pandas as pd
data = pd.DataFrame({"Employee_id":Employee_id,"Project_id":Project_id,"Employer_id":Employer_id,"Year":Year})

我的目标是跟踪个人在不同组织之间的变化情况,如下表右侧的两列所示。我想知道与前一年相比,他们离开了哪些组织,以及他们开始为哪些新组织工作(不管他们过去可能也曾为他们工作过)。

EN

回答 1

Stack Overflow用户

发布于 2018-12-04 22:42:03

我找到了我的问题的解决方案,可能不是很好,但它是有效的

代码语言:javascript
复制
import pandas as pd
Employee_id = [7102825752, 7102825752, 7102825752, 7102825752, 7102825752, 
7102825752, 7102825752, 7102825752, 7102825752, 7102825752]
Project_id = [28253288, 28648301, 28800042, 29113983, 29126250, 29364924, 29678870, 29691896, 29691235, 29691235]
Employer_id = [60031437, 60031437, 60033114, 115272656, 110625857, 60031437, 60031437, 60031437, 61273455, 85972742]
Year = [20121, 20122, 20131, 20141, 20151, 20152, 20161, 20161, 20162, 20162]
data = pd.DataFrame({"employee":Employee_id,"project":Project_id,"employer":Employer_id,"year":Year})

employee_employer_change_df = pd.DataFrame({"employee":[0],"project":[0],"employer":[[0]]})
employee_employer_change_df['employer'] = employee_employer_change_df['employer'].astype(object)

for employee in set(data["employee"]):
    for project in data.loc[data['employee'] == employee]['project']:
        employer_list = data.loc[data.loc[data['employee'] == employee]['project']==project]["employer"].tolist()
        df=pd.DataFrame({"employee":[employee],"project":[project],"employer":[0]})
        df['employer'] = df['employer'].astype(object)
        df.at[0, "employer"] = employer_list
        employee_employer_change_df = employee_employer_change_df.append(df)
employee_employer_change_df = employee_employer_change_df.reset_index() 
employee_employer_change_df = employee_employer_change_df.drop(employee_employer_change_df.index[0])

employee_employer_change_df=employee_employer_change_df.drop_duplicates(["employee","project"],keep='first')
employee_employer_change_df['previous_employer'] = employee_employer_change_df.groupby(['employee'])['employer'].shift(1)


previous_employer=employee_employer_change_df['previous_employer'].tolist()
current_employer=employee_employer_change_df['employer'].tolist()

true=np.isnan(float('nan'))
new_employer_list = []
leaving_employer_list = []
for u in range(0,len(previous_employer)):
    nan_test=np.isnan(previous_employer[u])
    if nan_test is not true:
        new_employer_list.append(list(set(current_employer[u]) - set(previous_employer[u])))
        leaving_employer_list.append(list(set(previous_employer[u]) - set(current_employer[u])))
    else:
        new_employer_list.append(["first year"])
        leaving_employer_list.append(["first year"])

employee_employer_change_df['new_affiliation'] = new_employer_list     
employee_employer_change_df['leaving_affiliation'] = leaving_employer_list 
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/53611995

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档