我对python编程很陌生。我有两个数据df1包含标签(180 k行)和df2包含设备名称(1600行)。
df1:
Line TagName CLASS 187877 PT\_WOA .ZS01\_LA120\_T05.SB.S2384\_LesSwL 10 187878 PT\_WOA .ZS01\_RB2202\_T05.SB.S2385\_FLOK 10 187879 PT\_WOA .ZS01\_LA120\_T05.SB.\_CBAbsHy 10 187880 PT\_WOA .ZS01\_LA120\_T05.SB.S3110\_CBAPV 10 187881 PT\_WOA .ZS01\_LARB2204.SB.S3111\_CBRelHy 10
df2:
EquipmentNo EquipmentDescription Equipment
1311256 Lifting table LA120
1311257 Roller bed RB2200
1311258 Lifting table LT2202
1311259 Roller bed RB2202
1311260 Roller bed RB2204设备位于df1.TagName中的字符串之间。我需要匹配如果df2设备在df1 Tagname中,那么df2 (设备描述和设备编号)必须与df1匹配。
最终输出应该是
Line TagName quipmentdescription EquipmentNo
187877 PT_WOA .ZS01_LA120_T05.SB.S2384_LesSwL Lifting table 1311256
187878 PT_WOA .ZS01_RB2202_T05.SB.S2385_FLOK Roller bed 1311259
187879 PT_WOA .ZS01_LA120_T05.SB._CBAbsHy Lifting table 1311256
187880 PT_WOA .ZS01_LA120_T05.SB.S3110_CBAPV Lifting table 1311256
187881 PT_WOA .ZS01_LARB2204.SB.S3111_CBRelHy Roller bed 1311260我已经试过了
cols= df2['Equipment'].tolist()
Xs=[]
for i in cols:
Test = df1.loc[df1.TagName.str.contains(i)]
Test['Equip']=i
Xs.append(Test)然后基于“设备”将xs和df2合并
但我发现了这个错误
第一个参数必须是字符串或已编译的模式
发布于 2020-05-15 04:41:43
我会这样做:
Equipment.
indexes,其中对于df2中的每个Equipment都在df1中查找索引列表,其中df1.TagName通过为每一项创建一行,使用stack()和df1为平坦的df2创建一行,以获取所有您想要的信息。
from io import StringIO
import numpy as np
import pandas as pd
df1=StringIO("""Line;TagName;CLASS
187877;PT_WOA;.ZS01_LA120_T05.SB.S2384_LesSwL;10
187878;PT_WOA;.ZS01_RB2202_T05.SB.S2385_FLOK;10
187879;PT_WOA;.ZS01_LA120_T05.SB._CBAbsHy;10
187880;PT_WOA;.ZS01_LA120_T05.SB.S3110_CBAPV;10
187881;PT_WOA;.ZS01_LARB2204.SB.S3111_CBRelHy;10""")
df2=StringIO("""EquipmentNo;EquipmentDescription;Equipment
1311256;Lifting table;LA120
1311257;Roller bed;RB2200
1311258;Lifting table;LT2202
1311259;Roller bed;RB2202
1311260;Roller bed;RB2204""")
df1=pd.read_csv(df1,sep=";")
df2=pd.read_csv(df2,sep=";")
df2['indexes'] = df2['Equipment'].apply(lambda x: df1.index[df1.TagName.str.contains(str(x)).tolist()].tolist())
indexes = df2.apply(lambda x: pd.Series(x['indexes']),axis=1).stack().reset_index(level=1, drop=True)
indexes.name = 'indexes'
df2 = df2.drop('indexes', axis=1).join(indexes).dropna()
df2.index = df2['indexes']
matches = df2.join(df1, how='inner')
print(matches[['Line','TagName','EquipmentDescription','EquipmentNo']])产出:
Line TagName EquipmentDescription EquipmentNo
187877 PT_WOA .ZS01_LA120_T05.SB.S2384_LesSwL Lifting table 1311256
187879 PT_WOA .ZS01_LA120_T05.SB._CBAbsHy Lifting table 1311256
187880 PT_WOA .ZS01_LA120_T05.SB.S3110_CBAPV Lifting table 1311256
187878 PT_WOA .ZS01_RB2202_T05.SB.S2385_FLOK Roller bed 1311259
187881 PT_WOA .ZS01_LARB2204.SB.S3111_CBRelHy Roller bed 1311260发布于 2020-05-15 05:23:46
初始化提供的数据文件:
import numpy as np
import pandas as pd
df1 = pd.DataFrame([['PT_WOA', '.ZS01_LA120_T05.SB.S2384_LesSwL', 10],
['PT_WOA', '.ZS01_RB2202_T05.SB.S2385_FLOK', 10],
['PT_WOA', '.ZS01_LA120_T05.SB._CBAbsHy', 10],
['PT_WOA', '.ZS01_LA120_T05.SB.S3110_CBAPV', 10],
['PT_WOA', '.ZS01_LARB2204.SB.S3111_CBRelHy', 10]],
columns = ['Line', 'TagName', 'CLASS'],
index = [187877, 187878, 187879, 187880, 187881])
df2 = pd.DataFrame([[1311256, 'Lifting table', 'LA120'],
[1311257, 'Roller bed', 'RB2200'],
[1311258, 'Lifting table', 'LT2202'],
[1311259, 'Roller bed', 'RB2202'],
[1311260, 'Roller bed', 'RB2204']],
columns = ['EquipmentNo', 'EquipmentDescription', 'Equipment'])我建议如下:
# create a copy of df1, dropping the 'CLASS' column
df3 = df1.drop(columns=['CLASS'])
# add the columns 'EquipmentDescription' and 'Equipment' filled with numpy NaN's
df3['EquipmentDescription'] = np.nan
df3['EquipmentNo'] = np.nan
# for each row in df3, iterate over each row in df2
for index_df3, row_df3 in df3.iterrows():
for index_df2, row_df2 in df2.iterrows():
# check if 'Equipment' is in 'TagName'
if df2.loc[index_df2, 'Equipment'] in df3.loc[index_df3, 'TagName']:
# set 'EquipmentDescription' and 'EquipmentNo'
df3.loc[index_df3, 'EquipmentDescription'] = df2.loc[index_df2, 'EquipmentDescription']
df3.loc[index_df3, 'EquipmentNo'] = df2.loc[index_df2, 'EquipmentNo']
# conver the 'EquipmentNo' to type int
df3['EquipmentNo'] = df3['EquipmentNo'].astype(int)这将产生以下数据:
Line TagName EquipmentDescription EquipmentNo
187877 PT_WOA .ZS01_LA120_T05.SB.S2384_LesSwL Lifting table 1311256
187878 PT_WOA .ZS01_RB2202_T05.SB.S2385_FLOK Roller bed 1311259
187879 PT_WOA .ZS01_LA120_T05.SB._CBAbsHy Lifting table 1311256
187880 PT_WOA .ZS01_LA120_T05.SB.S3110_CBAPV Lifting table 1311256
187881 PT_WOA .ZS01_LARB2204.SB.S3111_CBRelHy Roller bed 1311260如果这有帮助的话请告诉我。
发布于 2020-05-15 06:06:18
df1和df2如下:df1
| | Line | TagName | CLASS |
|---:|:-------|:--------------------------------|--------:|
| 0 | PT_WOA | .ZS01_LA120_T05.SB.S2384_LesSwL | 10 |
| 1 | PT_WOA | .ZS01_RB2202_T05.SB.S2385_FLOK | 10 |
| 2 | PT_WOA | .ZS01_LA120_T05.SB._CBAbsHy | 10 |
| 3 | PT_WOA | .ZS01_LA120_T05.SB.S3110_CBAPV | 10 |
| 4 | PT_WOA | .ZS01_LARB2204.SB.S3111_CBRelHy | 10 |df2
| | EquipmentNo | EquipmentDescription | Equipment |
|---:|--------------:|:-----------------------|:------------|
| 0 | 1311256 | Lifting table | LA120 |
| 1 | 1311257 | Roller bed | RB2200 |
| 2 | 1311258 | Lifting table | LT2202 |
| 3 | 1311259 | Roller bed | RB2202 |
| 4 | 1311260 | Roller bed | RB2204 |df2中的
Equipment中找到唯一的equipmentequipment = df2.Equipment.unique().tolist()通过在equipment中找到匹配项,
df1中创建Equipment列df1['Equipment'] = df1['TagName'].apply(lambda x: ''.join([part for part in equipment if part in x]))如果不希望在code.中的Equipment列中添加.drop(columns=['Equipment']),请将.drop(columns=['Equipment'])添加到下一行
df_final = df1[['Line', 'TagName', 'Equipment']].merge(df2, on='Equipment')df_final
| | Line | TagName | Equipment | EquipmentNo | EquipmentDescription |
|---:|:-------|:--------------------------------|:------------|--------------:|:-----------------------|
| 0 | PT_WOA | .ZS01_LA120_T05.SB.S2384_LesSwL | LA120 | 1311256 | Lifting table |
| 1 | PT_WOA | .ZS01_LA120_T05.SB._CBAbsHy | LA120 | 1311256 | Lifting table |
| 2 | PT_WOA | .ZS01_LA120_T05.SB.S3110_CBAPV | LA120 | 1311256 | Lifting table |
| 3 | PT_WOA | .ZS01_RB2202_T05.SB.S2385_FLOK | RB2202 | 1311259 | Roller bed |
| 4 | PT_WOA | .ZS01_LARB2204.SB.S3111_CBRelHy | RB2204 | 1311260 | Roller bed |https://stackoverflow.com/questions/61811137
复制相似问题