我有table1,它可以从<100行到大约100 000行不等。它包含22列,其中之一是包含字符串的描述,我想在其中搜索一些术语。
table1
+----------+--------+--------+--------+-------------+------+-------+-------+
| UniqueId | Cat1Id | Cat2Id | Cat3Id | Col4 | Col5 | Col21 | Col22 |
+----------+--------+--------+--------+-------------+------+-------+-------+
| 1 | 123 | | 33 | Description | | | |
| 2 | 11 | 22 | | Description | | | |
| 3 | 22 | 22 | 33 | Description | | | |
| 4 | 33 | | 455 | Description | | | |
| 5 | 111 | | | Description | | | |
| | | | | | | | |
+----------+--------+--------+--------+-------------+------+-------+-------+下面的table2包含我想要在上面的Col4中根据Id搜索的子字符串,这可以包含数百行。
table2
+--------+--------+--------+------+------+------+
| Cat1Id | Cat2Id | Cat3Id | Term | Val1 | Val2 |
+--------+--------+--------+------+------+------+
| 123 | | 33 | abc | 555 | 66 |
| 123 | | 33 | bca | 444 | 55 |
| 11 | 22 | | blah | 888 | 77 |
| 33 | | 455 | dddd | 999 | 77 |
| 33 | | 455 | aaaa | 777 | 22 |
| | | | | | |
+--------+--------+--------+------+------+------+我需要为Val1中的每一行保存Val2和table1。所以最后,我想要的是:
table3 - final
+----------+--------+--------+--------+-------------+------+------+------+
| UniqueId | Cat1Id | Cat2Id | Cat3Id | Col4 | Term | Val1 | Val2 |
+----------+--------+--------+--------+-------------+------+------+------+
| 1 | 123 | | 33 | Description | abc | 555 | 66 |
| 1 | 123 | | 33 | Description | bca | 444 | 55 |
| 2 | 11 | 22 | | Description | blah | 888 | 77 |
| 4 | 33 | | 455 | Description | dddd | 999 | 77 |
| 4 | 33 | | 455 | Description | aaaa | 777 | 22 |
| | | | | | | | |
+----------+--------+--------+--------+-------------+------+------+------+我的计划是这样做:
有更快的方法吗?如果我有100 K行,根据你的经验,这是慢的还是好的?几分钟的跑步时间对我来说是可以的。
发布于 2019-03-27 15:37:27
这可能是你要找的东西吗?您可以查看合并101以便获得更多选项(Pandas Merging 101)
mrg = pd.merge(df1, df2[['Cat1Id', 'Cat2Id', 'Cat3Id','Val1', 'Val2']], how='left', left_on=[
'Cat1Id', 'Cat2Id', 'Cat3Id'], right_on=['Cat1Id', 'Cat2Id', 'Cat3Id'])
mrg.dropna()
UniqueId Cat1Id Cat2Id Cat3Id Col4 Val1 Val2
0 1 123 33 Description 555 66
1 1 123 33 Description 444 55
2 2 11 22 Description 888 77
4 4 33 455 Description 999 77
5 4 33 455 Description 777 22https://stackoverflow.com/questions/55363151
复制相似问题