我有一个数据框,如下所示:
Customer 1 Customer 2 Customer 3
A B C
B C D
C D E
D E F
E F G
商店里不断有顾客光顾。我想在一小时内创建一排前3位顾客。随着客户的不断到来,它不断地取走组3并排成一行。虽然我不想形成严格的1-2,2-3等小时线。
我只想知道,如果客户B和C包含在第一行中,那么它们就不应该被计算在第二行中。我想删除有重叠项的行,只保留唯一的行。因此,我的预期输出将是:
Customer 1 Customer 2 Customer 3
A B C
D E F
G
如何做到这一点,请帮助。谢谢
发布于 2020-10-26 14:45:41
这是我对此的看法,还没有准备好,但应该会为你指明正确的方向,欢迎进行任何编辑
首先,让我们设置数据
df = pd.DataFrame(data={
"Customer 1": ["A", "B","C","D", "E"],
"Customer 2": [ "B","C","D", "E", "F"],
"Customer 3": ["C","D", "E", "F", "G"],
})在NumPy上工作会更好,所以让我们创建一个包含NumPy 2d数组的变量
df_np = df.values
df_np.flatten()[:6] # This will flatten the list and will only take the first 6 items to be able to reshape it later
np.unique(df_np) # Removes all duplicates so we will be only left with data shape that can be rebuilt into a DataFrame现在,让我们将其重塑为原始形状
np.reshape(c, (-1, 3))您现在可以重新构建数据帧
pd.DataFrame(data=c, columns=df.columns)我找不到一种方法来照顾G,正如我以前说过的那样,这不是一个完整的解决方案,所以欢迎任何编辑
发布于 2020-10-26 15:48:07
说明:
首先,我们得到所有的unique values across rows。根据请求将唯一值numpy数组组合在一起,一次取3个值,并用dataframe值填充其余未填充的列,然后将其转换回invalid。
import numpy as np
import pandas as pd
df = pd.DataFrame({"Customer 1" : ["A","B","C","D","E"],
"Customer 2" : ["B","C","D","E","F"],
"Customer 3" : ["C","D","E","F","G"]})
unique_vals = pd.unique(df[['Customer 1', 'Customer 2', 'Customer 3']].values.ravel('K'))
new_shape = unique_vals.size + 3 - unique_vals.size % 3
new_df_source = np.full(new_shape, fill_value = "invalid")
new_df_source.flat[:unique_vals.size] = unique_vals
new_df_source = new_df_source.reshape(-1,3)
output_df = pd.DataFrame(new_df_source, columns=df.columns)结果:
Customer 1 Customer 2 Customer 3
0 A B C
1 D E F
2 G invalid invalid警告: output_df中的行可能根本不会出现在input df中,因为我们正在查看唯一值并将它们分组在一起,尽管我们仍然保持唯一值的相对顺序。
https://stackoverflow.com/questions/64531902
复制相似问题