我是pandas dataframe的新手。我想合并行,这可能不是标准的合并/联接情况。如果索引(A和B)是公共的,但数据(D、E、F)是互斥的,则问题语句是合并下面的行数据。索引C无关紧要
A B C D E F
g h i - 1 -
g h q 2 - -
g i i - - 3
h q i 4 - -
g h q - - 5
g h t - - 6 预期的结果是,
A B C D E F
g h * 2 1 5
g i i - - 3
h q i 4 - -
g h t - - 6在本例中,第一行选择了行'g h q-- 5‘而不是'g h t-- 6’。但是,如果这两行中的任何一行被合并,而另一行保持不变,这是可以的
在这里,3条线被合并。首先合并两行,并将其视为'g h q- -5‘或’g h t-- 6‘的'new reference’,但是,如果可能,也可以找到所有互斥的行并一次合并。在真实数据中,'g h q-- 5‘和'g h t-- 6’同时存在的情况非常少见。
对于合并的行,我表示不关心C作为*,但我希望选择索引C数据之一,即(g或h)。对于合并行的索引C,先来的数据是可以的。
很抱歉操作不便,但我们将非常感谢您的帮助!
发布于 2021-07-21 13:40:09
自己回答。实际的数据帧稍有不同,但让我跟随示例的数据帧。为简单起见,省略了'g h t-- 6‘的编码。
df_D = df[~df['D'].isnull()]
df_E = df[~df['E'].isnull()]
df_F = df[~df['F'].isnull()]df_D = df_D.sort_values(['A']).drop_duplicates(['A','B'],keep='last')
...df_D.set_index(['A','B'], inplace=True)
df_D.update(df_E.set_index(['A','B']), overwrite=False)
df_D.update(df_F.set_index(['A','B']), overwrite=False) df_D.reset_index(inplace=True)谢谢,
发布于 2021-07-22 14:33:26
查找一行包含数据并填充其几乎为空的单元格。
Data_Frame = pd.DataFrame({'A':['g','g','g','h','g','g'],
'B':['h','h','i','q','h','h'],
'C':['i','q','i','i','q','t'],
'D':['-','2','-','4','-','-'],
'E':['1','-','-','-','-','-'],
'F':['-','-','3','-','5','6']})
Column_len=len(Data_Frame['A'])
exclusive_Data_Column =['D','E','F']
Duplicate_Value_Location=[]
Empty_location = []
Target_location = []
Elements = [(Data_Frame['A'][i], Data_Frame['B'][i]) for i in range(Column_len)]
for i, v in enumerate(Elements): ## Finding location of duplicated value.
if ('g','h') == v:
Duplicate_Value_Location.append(i)
for i in exclusive_Data_Column: ## Recording location of each data
for x in Duplicate_Value_Location:
if Data_Frame[i][x] == '-':
Empty_location.append((i,x))
elif Data_Frame[i][x] != '-':
Target_location.append((i,x))
Target_row = [i[1] for i in Target_location]
Target_column = [i[0] for i in Target_location]
Empty_column = [i[0] for i in Empty_location]
Empty_row = [i[1] for i in Empty_location]
Unmovable_cell = []
Nearest_Target_row_index = [i for i, v in enumerate(Target_row) if v ==min(Target_row)][0]
Nearest_Target_row_value = Target_row[Nearest_Target_row_index]
Nearest_Target_column_value = Target_column[Nearest_Target_row_index]
Unmovable_cell.append((Nearest_Target_column_value,Nearest_Target_row_value))
Nearest_Empty_column_index = [i for i, v in enumerate(Empty_row) if v == Nearest_Target_row_value]
for column_index in Nearest_Empty_column_index:
Empty_column_value = Empty_column[column_index]
Empty_row_value = Empty_row[column_index]
for Target_column_index, Target_column_value in enumerate(Target_column):
Target_row_value = Target_row[Target_column_index]
if Target_column_value == Empty_column_value and (Target_column_value, Target_row_value) not in Unmovable_cell:
Merged_Target_column_value = Target_column[Target_column_index]
Merged_Target_row_value = Target_row[Target_column_index]
Unmovable_cell.append((Empty_column_value, Empty_row_value))
Target_column = [v for i, v in enumerate(Target_column) if i != Target_column_index]
Target_column.append(Empty_column_value)
Target_row = [v for i, v in enumerate(Target_row) if i != Target_column_index]
Target_row.append(Empty_row_value)
Empty_column_2 = [v for i, v in enumerate(Empty_column) if i != column_index]
Empty_column_2.append(Merged_Target_column_value)
Empty_row_2 = [v for i, v in enumerate(Empty_row) if i != column_index]
Empty_row_2.append(Merged_Target_row_value)
Data_Frame[Empty_column_value][Empty_row_value] = Data_Frame[Empty_column_value][Target_row_value]
Data_Frame = Data_Frame.drop(Target_row_value)
Data_Frame['C'][Nearest_Target_row_value] = '*'
print(Data_Frame)出来了:
A B C D E F
0 g h * 2 1 5
2 g i i - - 3
3 h q i 4 - -
5 g h t - - 6https://stackoverflow.com/questions/68386396
复制相似问题