我有两个数据集,如:
Tags
Insurance
Asset
Bank
Municipality
Government
Corporate
Gas
General US Public Finance
Real Estate等等..。我想将这些标记分配给其他数据集。
第二个数据集如下所示:
UserTags
Real Estate Insurance
Corporate - Finance Company
Corporate - Energy / Utility / Commodities
Corporate - Non-Financial Other
Government Entity - Central Bank
Government Entity - Regulator
Government Entity - Municipality
Asset Bank我想使用Python来匹配这两个数据集,如下所示:
UserTags AssignedTags Real Estate Insurance Real Estate Real Estate Insurance Insurance Corporate - Finance Company Corporate Corporate - Energy / Utility / Commodities Corporate Corporate - Non-Financial Other Corporate Government Entity - Central Bank Government Government Entity - Central Bank Bank Government Entity - Regulator Government Government Entity - Municipality Government Government Entity - Municipality Municipality Asset Bank Asset Asset Bank Bank
因此,基本上,第一个标签“房地产保险”包含两个标签:房地产和保险,所以它来了两次,每个标签包含一个。“政府实体-市政府”也是如此。
我该怎么做?另外,如果没有完全匹配,是否可以指定部分匹配的标记?例如:
Tag AssignedTag
Municipal Municipality谢谢。
发布于 2019-05-23 21:03:05
假设这两个都是熊猫系列。我把问题中的第一个系列称为“标记”,第二个系列称为“user_tags”。
matched = tags.apply(
lambda x: user_tags.loc[tags.str.contains(x)]
)
final_table = pd.concat([tags,matched],axis=1)https://stackoverflow.com/questions/56282643
复制相似问题