我有一个文本文件,其中有几行,它们之间有一些数据,我需要将这些数据转换为dataframe(有用的数据)。
我逐行迭代文本文件,并借助正则表达式捕获有用的数据。
就像这样,
pattern = r'^(\s)(\d+)(\s+)(\d)(\s+)(\w+)(\s+)(\w+)(.*)'
capture_data = []
with open(file,'r') as file_obj:
lineList = file_obj.readlines()
for line in lineList:
info_list = re.search(pattern, line)
if info_list is not None:
capture_data.append(line)捕获的数据如下所示
' 100 0 PASS Continuity_PPMU_mV XSCI 140 -1.0000 V -427.9508 mV -300.0000 mV -100.0000 uA 0 \n'
' 100 1 PASS Continuity_PPMU_mV XSCI 12 -1.0000 V -430.3089 mV -300.0000 mV -100.0000 uA 0 \n'我想迭代每个捕获的行,并根据空格进行拆分,但问题是,单元之间有空白,例如.
-300.0000 mV,-100.0000 uA等
另外一个问题是尾随换行符,它也被视为.split(“")中的一个新元素。
有人能帮我找到更聪明的方法吗?
我想要的只是将这些值作为单独的列值。
例如,在第一个字符串中,
100成为第一名,第0名,第0名,第3名,Continuity_PPMU_mV -第4名等.
谢谢。
编辑:
原始数据有点像这样-
Site Number:
0, 1, 2, 3
Device#: 1-4
*********************************************************************
FT45434HAP PQF64 Test @ RHC
*********************************************************************
---------------------------Continuity Test---------------------------
Number Site Result Test Name Pin Channel Low Measured High Force Loc
100 0 PASS Continuity_PPMU_mV XSCI 140 -1.0000 V -427.9508 mV -300.0000 mV -100.0000 uA 0
100 1 PASS Continuity_PPMU_mV XSCI 12 -1.0000 V -430.3089 mV -300.0000 mV -100.0000 uA 0
100 2 PASS Continuity_PPMU_mV XSCI 76 -1.0000 V -430.7492 mV -300.0000 mV -100.0000 uA 0
100 3 PASS Continuity_PPMU_mV XSCI 204 -1.0000 V -431.0482 mV -300.0000 mV -100.0000 uA 0
101 0 PASS Continuity_PPMU_mV XSCO 139 -1.0000 V -456.0359 mV -300.0000 mV -100.0000 uA 0
101 1 PASS Continuity_PPMU_mV XSCO 11 -1.0000 V -458.0605 mV -300.0000 mV -100.0000 uA 0
101 2 PASS Continuity_PPMU_mV XSCO 75 -1.0000 V -457.8564 mV -300.0000 mV -100.0000 uA 0 编辑
顶部的行不是固定的,而是动态生成的。另外,其他一些文本数据可以出现在相关数据之间,比如两个有用的行之间。所以,我认为跳过行在这里是行不通的。
发布于 2021-02-03 06:21:26
data.
'Number'开头的行,然后将这些行追加到数据行之后,只有单元由空格分隔。H 212f 213>import pandas as pd
import seaborn as sns
# read the file in
data = list()
with open('test.txt', 'r') as f:
rows = f.readlines()
flag = False # flag to True once the header row with Number is found
for row in rows:
row = row.strip()
if row.startswith('Number'):
flag = True
continue # after the header row is found, skip it
if flag:
data.append(row.split()) # append rows after the header to data
# create a custom header where the unites have been added as column headers
header = ['Number', 'Site', 'Result', 'Test_Name', 'Pin', 'Channel', 'Low', 'U1', 'Measured', 'U2', 'High', 'U3', 'Force', 'U4', 'Loc']
# create the dataframe
df = pd.DataFrame(data, columns=header)
# save to csv
df.to_csv('file.csv', index=False)
# convert columns to numeric dtypes
df = df.apply(pd.to_numeric, errors='ignore')
# scale the columns as per their units
df.Measured = df.Measured.div(1000)
df.High = df.High.div(1000)
df.Force = df.Force.div(100000)
# display(df)
Number Site Result Test_Name Pin Channel Low U1 Measured U2 High U3 Force U4 Loc
0 100 0 PASS Continuity_PPMU_mV XSCI 140 -1.0 V -0.427951 mV -0.3 mV -0.001 uA 0
1 100 1 PASS Continuity_PPMU_mV XSCI 12 -1.0 V -0.430309 mV -0.3 mV -0.001 uA 0
2 100 2 PASS Continuity_PPMU_mV XSCI 76 -1.0 V -0.430749 mV -0.3 mV -0.001 uA 0
3 100 3 PASS Continuity_PPMU_mV XSCI 204 -1.0 V -0.431048 mV -0.3 mV -0.001 uA 0
4 101 0 PASS Continuity_PPMU_mV XSCO 139 -1.0 V -0.456036 mV -0.3 mV -0.001 uA 0
5 101 1 PASS Continuity_PPMU_mV XSCO 11 -1.0 V -0.458060 mV -0.3 mV -0.001 uA 0
6 101 2 PASS Continuity_PPMU_mV XSCO 75 -1.0 V -0.457856 mV -0.3 mV -0.001 uA 0
# plot
ax = sns.lineplot(data=df.iloc[:, 6:-2])
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')

发布于 2021-02-03 06:25:11
您可以跳过第一行,并将分隔符指定为\s\s+。
pd.read_csv('file.txt', skiprows=10, sep='\s\s+', engine='python')输出:
Number Site Result Test Name Pin Channel Low Measured High Force Loc
0 100 0 PASS Continuity_PPMU_mV XSCI 140 -1.0000 V -427.9508 mV -300.0000 mV -100.0000 uA 0
1 100 1 PASS Continuity_PPMU_mV XSCI 12 -1.0000 V -430.3089 mV -300.0000 mV -100.0000 uA 0
2 100 2 PASS Continuity_PPMU_mV XSCI 76 -1.0000 V -430.7492 mV -300.0000 mV -100.0000 uA 0
3 100 3 PASS Continuity_PPMU_mV XSCI 204 -1.0000 V -431.0482 mV -300.0000 mV -100.0000 uA 0
4 101 0 PASS Continuity_PPMU_mV XSCO 139 -1.0000 V -456.0359 mV -300.0000 mV -100.0000 uA 0
5 101 1 PASS Continuity_PPMU_mV XSCO 11 -1.0000 V -458.0605 mV -300.0000 mV -100.0000 uA 0
6 101 2 PASS Continuity_PPMU_mV XSCO 75 -1.0000 V -457.8564 mV -300.0000 mV -100.0000 uA 0此外,如果您不确定应该忽略多少起始行,则可能会试图找到一个模式来忽略前几行。例如,如果数据模式是一致的,则可以读取第一行,直到第一列匹配为止(在本例中,为"Number"):
# Identify how many rows we need to skip (avoiding reading the whole file)
skiplines=0
with open('file.txt') as file:
line = file.readline()
while not line.lstrip().startswith('Number'):
skiplines += 1
line = file.readline()
# Then read it with pandas
pd.read_csv('file.txt', skiprows=skiplines, sep='\s\s+', engine='python')无论如何,很容易修改上面的代码块以匹配不同的文件模式,利用它的逻辑。例如,输出总是显示“连续性测试”行?如果数据总是显示在这一行之后,这就是您要寻找的模式。
https://stackoverflow.com/questions/66021971
复制相似问题