这是我的输入:
import pandas as pd
df = pd.DataFrame(np.array([[1, 0.0, "192.168.1.1" ,"192.168.1.2", "UDP" , 64],
[2, 0.2, "192.168.1.1" ,"192.168.1.3", "UDP" , 64],
[3, 0.8, "192.168.1.1" ,"192.168.1.4", "UDP" , 64],
[4, 1.01, "192.168.1.1" ,"192.168.1.2", "ARP" , 64],
[5, 1.23, "192.168.1.1" ,"192.168.1.3", "UDP" , 64],
[6, 1.44, "192.168.1.1" ,"192.168.1.4", "UDP" , 64],
[7, 1.90, "192.168.1.1" ,"192.168.1.2", "ARP" , 64],
[8, 2.05, "192.168.1.1" ,"192.168.1.3", "UDP" , 64],
[9, 2.3, "192.168.1.1" ,"192.168.1.4", "UDP" , 64],
[10, 2.5, "192.168.1.1" ,"192.168.1.2", "UDP" , 64],
[11, 2.67, "192.168.1.1" ,"192.168.1.3", "ARP" , 64]]),
columns=['No.', 'Time','Source', 'Destination', 'Protocol', 'Length'],
index =['0', '1', '2','3','4','5','6','7','8','9','10'])这是输出:
No. Time Source Destination Protocol Length
0 1 0.0 192.168.1.1 192.168.1.2 UDP 64
1 2 0.2 192.168.1.1 192.168.1.3 UDP 64
2 3 0.8 192.168.1.1 192.168.1.4 UDP 64
3 4 1.01 192.168.1.1 192.168.1.2 ARP 64
4 5 1.23 192.168.1.1 192.168.1.3 UDP 64
5 6 1.44 192.168.1.1 192.168.1.4 UDP 64
6 7 1.9 192.168.1.1 192.168.1.2 ARP 64
7 8 2.05 192.168.1.1 192.168.1.3 UDP 64
8 9 2.3 192.168.1.1 192.168.1.4 UDP 64
9 10 2,5 192.168.1.1 192.168.1.2 UDP 64
10 11 2,67 192.168.1.1 192.168.1.3 ARP 64现在,我想按协议"ARP“对输入进行分组。每次ARP协议出现在数据中时,都应该出现一个序列。
这就是我想要的:
Secquence No. Time Source Destination Protocol Length
1 0.0 192.168.1.1 192.168.1.2 UDP 64
2 0.2 192.168.1.1 192.168.1.3 UDP 64
3 0.8 192.168.1.1 192.168.1.4 UDP 64
1 4 1.01 192.168.1.1 192.168.1.2 ARP 64
5 1.23 192.168.1.1 192.168.1.3 UDP 64
6 1.44 192.168.1.1 192.168.1.4 UDP 64
2 7 1.9 192.168.1.1 192.168.1.2 ARP 64
8 2.05 192.168.1.1 192.168.1.3 UDP 64
9 2.3 192.168.1.1 192.168.1.4 UDP 64
10 2,5 192.168.1.1 192.168.1.2 UDP 64
3 11 2,67 192.168.1.1 192.168.1.3 ARP 64发布于 2020-01-13 13:26:56
cond = df.Protocol == 'ARP'
df.loc[cond, 'Sequence'] = df[cond].groupby('Protocol').cumcount() + 1
print(df)
No. Time Source Destination Protocol Length Sequence
0 1 0.0 192.168.1.1 192.168.1.2 UDP 64 NaN
1 2 0.2 192.168.1.1 192.168.1.3 UDP 64 NaN
2 3 0.8 192.168.1.1 192.168.1.4 UDP 64 NaN
3 4 1.01 192.168.1.1 192.168.1.2 ARP 64 1.0
4 5 1.23 192.168.1.1 192.168.1.3 UDP 64 NaN
5 6 1.44 192.168.1.1 192.168.1.4 UDP 64 NaN
6 7 1.9 192.168.1.1 192.168.1.2 ARP 64 2.0
7 8 2.05 192.168.1.1 192.168.1.3 UDP 64 NaN
8 9 2.3 192.168.1.1 192.168.1.4 UDP 64 NaN
9 10 2.5 192.168.1.1 192.168.1.2 UDP 64 NaN
10 11 2.67 192.168.1.1 192.168.1.3 ARP 64 3.0如果要用NaN更改''值并重新排列列,
df.loc[df.Sequence.isnull(), 'Sequence'] = ''
cols = df.columns.tolist()
cols = cols[-1:] + cols[:-1]
print(df[cols])
Sequence No. Time Source Destination Protocol Length
0 1 0.0 192.168.1.1 192.168.1.2 UDP 64
1 2 0.2 192.168.1.1 192.168.1.3 UDP 64
2 3 0.8 192.168.1.1 192.168.1.4 UDP 64
3 1 4 1.01 192.168.1.1 192.168.1.2 ARP 64
4 5 1.23 192.168.1.1 192.168.1.3 UDP 64
5 6 1.44 192.168.1.1 192.168.1.4 UDP 64
6 2 7 1.9 192.168.1.1 192.168.1.2 ARP 64
7 8 2.05 192.168.1.1 192.168.1.3 UDP 64
8 9 2.3 192.168.1.1 192.168.1.4 UDP 64
9 10 2.5 192.168.1.1 192.168.1.2 UDP 64
10 3 11 2.67 192.168.1.1 192.168.1.3 ARP 64发布于 2020-01-13 13:49:57
如果不想要NA值,也可以使用df.dropna(subset= ['Secquence'])或df[pd.notnull(df['Secquence'])],但是只有行,其中Secqeunce列的值不是数据格式中的NaN。
https://stackoverflow.com/questions/59716862
复制相似问题