我有一个熊猫DataFrame的会员记录,其中包含一些输入错误(见下面的几个例子)。有些成员在实际上是“更新”和(或)多次被列为“加入”时,被错误地认定为“加入”。我想纠正这些错误,将“联合”转换为“更新”,并根据专栏年作为批准海盗的反面。
df = pd.DataFrame({2012: {0: 'Renewal', 1: 'Joined', 2: np.nan, 3: np.nan},
2013: {0: 'Renewal', 1: 'Renewal', 2: 'Joined', 3: np.nan},
2014: {0: 'Joined', 1: 'Joined', 2: 'Renewal', 3: 'Renewal'},
2015: {0: 'Renewal', 1: 'Renewal', 2: 'Joined', 3: 'Renewal'},})
df
2012 2013 2014 2015
0 Renewal Renewal Joined Renewal
1 Joined Renewal Joined Renewal
2 NaN Joined Renewal Joined
3 NaN NaN Renewal Renewal这样做是可行的,但不优雅且耗时:使用np.where和一个循环来更新行,我可以逐列修改。例如,要修复重复的“联接”行:
years = list(df.columns)
for col in df[years[1:]]:
df[col] = np.where(((df[years[0]] == 'Joined') & (df[col] == 'Joined')), 'Renewal', df[col])
for col in df[years[2:]]:
df[col] = np.where(((df[years[1]] == 'Joined') & (df[col] == 'Joined')), 'Renewal', df[col])
for col in df[years[3:]]:
df[col] = np.where(((df[years[2]] == 'Joined') & (df[col] == 'Joined')), 'Renewal', df[col])
df
2012 2013 2014 2015
0 Renewal Renewal Joined Renewal
1 Joined Renewal Renewal Renewal
2 NaN Joined Renewal Renewal
3 NaN NaN Renewal Renewal将摆脱任何重复的“加入”,但我没有足够的技能来做所有的列一次。现在,我正在手动更新每一个连续列和那里的100万条记录和大约20年的连续数据。
我理解还有其他需要纠正的地方,但如果我能理解如何循环和纠正这个问题,作为第一步,我怀疑我可以用同样的方法来解决其他的困难。
试图循环这个问题:我尝试过一个更复杂的循环的几个变体,但是我没有得到响应,没有错误,也没有意外地覆盖所有数据。
例如,
for x in range(len(years)):
for col in df[years[x+1]]:
df[col] = np.where(((df[years[x]] == 'Joined') & (df[col] == 'Joined')), 'Renewal', df[col])是否有一种方法可以对与第一列相关联的值进行总体更新,然后转移到下一列?它不一定是一个循环,我只是假设这是解决方案。
谢谢你的任何建议/例子。
发布于 2022-06-27 13:43:48
如果您想用“更新”替换除第一个“联接”以外的所有“连接”--您可以创建一个DataFrame的副本,将所有的“连接”替换为“更新”,找到第一个出现的“连接”的索引,然后将其放回副本中。
# Create a copy frame
df2 = df.applymap(lambda x: 'Renewal' if x == 'Joined' else x)
# Pull first joined from original
first_joined = df.apply(lambda x: x=='Joined', axis=1).apply(pd.Series.idxmax, axis=1)
# Replace first "Joined" in the copy
for item in first_joined.items():
if df.loc[item] == 'Joined':
df2.loc[item] = 'Joined'输出
# print(df2)
2012 2013 2014 2015
0 Renewal Renewal Joined Renewal
1 Joined Renewal Renewal Renewal
2 NaN Joined Renewal Renewal
3 NaN NaN Renewal Renewalhttps://stackoverflow.com/questions/72764336
复制相似问题