这有点复杂,但我基本上需要将来自第二个dataframe的数据与第一个dataframe(DF)的值对齐。第一个DF有不同版本的正确名称,第二个DF有正确的名称。最终产品应该有原始名称,列旁边应该有正确的名称。
第一个DF比第二个DF具有更多的值。我认为自动化整个过程是不可能的,因为第一个DF可能有如此多的变化。我只想匹配准确的匹配,其余的我将手动完成。也许我还会执行一个apply函数来检查删除()是否导致了直接匹配。但基本上,我需要对齐精确的匹配在一起。如果不能对每个值都这样做,那么匹配可能的值还是很好的,我可以手动完成或填充rest。解决这个问题的最好方法是什么?我一定是用错了,但是.concat和.merge不是很有用。
data1 = {'Name': ['Fuel 1 1-2-1','Fuel(1) 1-2-1', 'Fuel(1) 1-2-2', 'Fuel 2 2-2-1', 'Storage 1 2-2-2','Storage (2) 1-2-2','Storage 2 1-2-2',
'Tank 2']}
Name
0 Fuel 1 1-2-1
1 Fuel(1) 1-2-1
2 Fuel(1) 1-2-2
3 Fuel 2 2-2-1
4 Storage 1 2-2-2
5 Storage (2) 1-2-2
6 Storage 2 1-2-2
7 Tank 2
data2 = {'Required Change': ['Fuel 1 1-2-1', "Fuel 1 1-2-2",
'Fuel 2-2-1','Storage 1 2-2-2','Storage 2 1-2-2', 'Tank 2', 'Tank 3']}
Required Change
0 Fuel 1 1-2-1
1 Fuel 1 1-2-2
2 Fuel 2-2-1
3 Storage 1 2-2-2
4 Storage 2 1-2-2
5 Tank 2
6 Tank 3需要它看起来像这样:
data3 = {'Name': ['Fuel 1 1-2-1','Fuel(1) 1-2-1', 'Fuel(1) 1-2-2', 'Fuel 2 2-2-1', 'Storage 1 2-2-2','Storage (2) 1-2-2','Storage 2 1-2-2',
'Tank 2'],
'Required Change': ['Fuel 1 1-2-1', 'Fuel 1 1-2-1', "Fuel 1 1-2-2", 'Fuel 2-2-1','Storage 1 2-2-2','Storage 2 1-2-2',
'Storage 2 1-2-2', 'Tank 2']
}
Name Required Change
0 Fuel 1 1-2-1 Fuel 1 1-2-1
1 Fuel(1) 1-2-1 Fuel 1 1-2-1
2 Fuel(1) 1-2-2 Fuel 1 1-2-2
3 Fuel 2 2-2-1 Fuel 2-2-1
4 Storage 1 2-2-2 Storage 1 2-2-2
5 Storage (2) 1-2-2 Storage 2 1-2-2
6 Storage 2 1-2-2 Storage 2 1-2-2
7 Tank 2 Tank 2发布于 2022-05-19 22:40:21
如果您想要清理data1并在data2中找到确切的匹配,您可以尝试如下:
data1["Cleaned"] = (
data1["Name"].str.replace(r"\(|\)", " ", regex=True)
.str.replace(" {2,}", " ", regex=True)
)
data1.merge(data2, how="left", left_on="Cleaned", right_on="Required Change")结果:
Name Cleaned Required Change
0 Fuel 1 1-2-1 Fuel 1 1-2-1 Fuel 1 1-2-1
1 Fuel(1) 1-2-1 Fuel 1 1-2-1 Fuel 1 1-2-1
2 Fuel(1) 1-2-2 Fuel 1 1-2-2 Fuel 1 1-2-2
3 Fuel 2 2-2-1 Fuel 2 2-2-1 NaN # Can't find match here
4 Storage 1 2-2-2 Storage 1 2-2-2 Storage 1 2-2-2
5 Storage (2) 1-2-2 Storage 2 1-2-2 Storage 2 1-2-2
6 Storage 2 1-2-2 Storage 2 1-2-2 Storage 2 1-2-2
7 Tank 2 Tank 2 Tank 2该解决方案将(和)替换为空格,并将多个空白折叠为1。您的实际数据可能有更多的变化,需要相应地调整regex模式。
https://stackoverflow.com/questions/72311207
复制相似问题