我有一个来自机器的timeseries数据帧,其中值以不同的标记形式出现,一些标记以diff格式出现。
| datetime | tagid | value |
|---------------------|--------|-------|
| 08-04-2021 11:30:58 | BNO_01 | 12849 |
| 08-04-2021 11:30:58 | BNO_02 | 12597 |
| 08-04-2021 11:30:58 | BNO_03 | 14390 |
| 08-04-2021 11:30:58 | MDL_01 | 21328 |
| 08-04-2021 11:30:58 | MDL_02 | 22304 |
| 08-04-2021 11:30:58 | SEQ_01 | 12340 |
| 08-04-2021 11:30:58 | SEQ_02 | 13622 |
| 08-04-2021 11:30:58 | STA | 724 |
| 08-04-2021 11:30:58 | STO | 735 |df‘’tagid‘=df’‘tagid’.应用(lambda:chr(圆形(x/ 256)) + chr(x % 256)),但只适用于上面的标签行
示例:
MDL_01= 21328 --> 'SP',
MDL_02= 22304 -22304= 'W‘
BNO_01= 12849 -“21”
BNO_02= 12597 -‘12597 15’>
BNO_03= 14390 --14390 6‘> '86’
BNO= 'SPW 211586‘
所需数据
| datetime | tagid | value |
|---------------------|-------|------------|
| 08-04-2021 11:30:58 | BNO | SPW 211586 |
| 08-04-2021 11:30:58 | SEQ | 0456 |
| 08-04-2021 11:30:58 | STA | 724 |
| 08-04-2021 11:30:58 | STO | 735 |发布于 2021-04-13 05:52:29
Idea是先用Series.str.startswith过滤值,然后用boolean indexing过滤,然后用split处理行,然后用join排序MDL到BNO聚合值,最后用~对原始过滤行使用concat,~没有匹配条件。
此解决方案的优点是不更改非匹配值,因此,如果重复(如2次STA ),也不将values更改为字符串,则永远不要进行聚合。
df['datetime'] = pd.to_datetime(df['datetime'])
vals = ['BNO','MDL','SEQ']
mask = df['tagid'].str.startswith(tuple(vals))
df1 = df[mask].copy()
df1['value'] = df1['value'].apply(lambda x: chr(round(x / 256)) + chr(x % 256))
df1['tagid'] = df1['tagid'].str.split('_').str[0]
df1 = (df1.sort_values('tagid', ascending=False)
.replace({'MDL':'BNO'})
.groupby(['datetime','tagid'])['value']
.agg(''.join)
.reset_index())
df = pd.concat([df1, df[~mask]], ignore_index=True)
print (df)
datetime tagid value
0 2021-08-04 11:30:58 BNO SPW 211586
1 2021-08-04 11:30:58 SEQ 0456
2 2021-08-04 11:30:58 STA 724
3 2021-08-04 11:30:58 STO 735发布于 2021-04-13 06:02:05
首先,将其value列的tagid列包含_的值更改为char。
然后从_列中删除tagid。
df['value'].update(df.loc[df['tagid'].str.contains('_'), 'value'].apply(lambda x: chr(round(x / 256)) + chr(x % 256)))
df['tagid'] = df['tagid'].apply(lambda x: x.split('_')[0])# print(df)
datetime tagid value
0 08-04-2021 11:30:58 BNO 21
1 08-04-2021 11:30:58 BNO 15
2 08-04-2021 11:30:58 BNO 86
3 08-04-2021 11:30:58 MDL SP
4 08-04-2021 11:30:58 MDL W
5 08-04-2021 11:30:58 SEQ 04
6 08-04-2021 11:30:58 SEQ 56
7 08-04-2021 11:30:58 STA 724
8 08-04-2021 11:30:58 STO 735此外,使用groupby() datetime和tagid列,并在每个组中使用''连接value列。
df_ = df.groupby(['datetime','tagid']).apply(lambda x: ''.join(map(str, x['value'].tolist()))).reset_index().rename({0: 'value'}, axis=1)print(df_)
datetime tagid value
0 08-04-2021 11:30:58 BNO 211586
1 08-04-2021 11:30:58 MDL SPW
2 08-04-2021 11:30:58 SEQ 0456
3 08-04-2021 11:30:58 STA 724
4 08-04-2021 11:30:58 STO 735最后将BNO和MDL行结合起来,删除MDL行。
df_.loc[df_['tagid'] == 'BNO', 'value'] = df_.loc[df_['tagid'] == 'MDL', 'value'].iloc[0] + ' ' + df_.loc[df_['tagid'] == 'BNO', 'value'].iloc[0]
df_ = df_[~(df_['tagid'] == 'MDL')]# print(df_)
datetime tagid value
0 08-04-2021 11:30:58 BNO SPW 211586
2 08-04-2021 11:30:58 SEQ 0456
3 08-04-2021 11:30:58 STA 724
4 08-04-2021 11:30:58 STO 735https://stackoverflow.com/questions/67069214
复制相似问题