我是python的新手,我想改进我用VBA制作的几个excel程序。就像下面这个。我有一个机器日志,它由2列和平均50,000行组成,每组由空格分隔。示例:
我想将它转换为每个组的专栏。
我不需要第一列,我只需要转换第二列。我已经在excel中通过了VBA,但它花了2-5分钟来转换50,000行。
我已经自学python有一段时间了,我希望它能通过pandas或numpy加速这个过程。
非常感谢。
发布于 2021-05-27 17:37:25
输入数据:
df = pd.read_excel("sample.xlsx", header=None, names=["Operation", "Data"])>>> df
Operation Data
0 <Operation> NaN # begin 1st group (idx1)
1 NaN <Timestamp>value</Timestamp>
2 NaN <Type>value</Type>
3 NaN <Name>value</Name>
4 NaN <Action>value</Action
5 NaN <Data>value</Data>
6 </Operation> NaN # end 1st group (idx2)
7 <Operation> NaN # begin 2nd group (idx1)
8 NaN <Timestamp>value</Timestamp>
9 NaN <Type>value</Type>
10 NaN <Name>value</Name>
11 NaN <Action>value</Action
12 NaN <Data>value</Data>
13 </Operation> NaN # end 2nd group (idx2)
14 <Operation> NaN # begin 3rd group (idx1)
15 NaN <Timestamp>value</Timestamp>
16 NaN <Type>value</Type>
17 NaN <Name>value</Name>
18 NaN <Action>value</Action
19 </Operation> NaN # end 3rd group (idx2)代码段内的注释。下面是这段代码的一行版本:
data = []
idx1 = df[df["Operation"].eq("<Operation>")].index # [0, 6, 13]
idx2 = df[df["Operation"].eq("</Operation>")].index # [7, 14, 19]
for i1, i2 in zip(idx1, idx2): # [(0, 7), (6, 14), (13, 19)]
# Get values inside the group [(1, 6), (7, 13), (14, 18)]
df1 = df["Data"].loc[i1+1:i2-1].reset_index(drop=True)
data.append(df1)
# Concatenate all operations, swap columns and rows (.Transpose)
out = pd.concat(data, axis="columns").T.reset_index(drop=True)
# One line
# out = pd.concat([df["Data"].loc[i1+1:i2-1].reset_index(drop=True)
# for i1, i2 in zip(df[df["Operation"].eq("<Operation>")].index,
# df[df["Operation"].eq("</Operation>")].index)],
# axis="columns").T.reset_index(drop=True)输出结果:
>>> out
0 1 2 3 4
0 <Timestamp>value</Timestamp> <Type>value</Type> <Name>value</Name> <Action>value</Action <Data>value</Data>
1 <Timestamp>value</Timestamp> <Type>value</Type> <Name>value</Name> <Action>value</Action <Data>value</Data>
2 <Timestamp>value</Timestamp> <Type>value</Type> <Name>value</Name> <Action>value</Action NaNhttps://stackoverflow.com/questions/67717140
复制相似问题