我们正试图找到一种方法来解析一个棘手的文本文件,该文本文件是通过使用Python进行PEST分析生成的。它显示了对超过3万次观测的63个不同变量的测量。下面是输出的一个例子(3/>30,000 )
cmfa cmfb cmfc cmfd cmla cmlb cmlc cmld
cmle cgfa cgfb cgfc cgfd cgfe dgfa dgfb
dgfc dgfd icfa icfb icfc icfd vawa vawb
vawc vawd vawe vawf vswa vswb vswc vswd
vswe chfa chfb chfc chfd chfe cgwa cgwb
cgwc cgwd cgwe crta crtb crtc crtd crte
icha ichb ichc ichd iche csea cseb csec
csed csee csef caqa caqb crsa crsb
0 -1.900000E-03 1.080000E-02 3.150000E-02 0.00000 0.00000 0.00000 0.00000 -3.020000E-02
0.00000 -1.870000E-02 0.00000 4.600000E-03 0.00000 0.00000 0.00000 4.510000E-02
0.00000 0.00000 3.650000E-02 -7.000000E-03 -2.100000E-03 -2.000000E-04 3.200000E-03 8.000000E-03
-7.000000E-04 -1.500000E-02 0.00000 4.800000E-03 1.900000E-03 4.000000E-04 2.500000E-03 2.500000E-03
-1.400000E-02 0.00000 0.00000 0.00000 0.00000 0.00000 -3.200000E-03 -8.060000E-02
-0.126500 0.298400 0.00000 0.00000 0.00000 0.00000 0.00000 8.000000E-04
-1.900000E-03 1.400000E-03 0.00000 0.00000 -3.200000E-03 0.00000 0.00000 0.00000
0.00000 0.00000 0.00000 0.00000 0.00000 -1.200000E-02 1.930000E-02
1 -1.800000E-03 1.140000E-02 1.850000E-02 0.00000 0.00000 0.00000 0.00000 -2.600000E-02
0.00000 -8.200000E-03 0.00000 1.200000E-03 0.00000 0.00000 0.00000 0.00000
0.00000 0.00000 2.560000E-02 -6.100000E-03 -1.100000E-03 0.00000 3.000000E-03 7.400000E-03
-7.000000E-04 -1.410000E-02 0.00000 5.000000E-03 1.900000E-03 3.000000E-04 2.300000E-03 2.300000E-03
-1.330000E-02 0.00000 0.00000 0.00000 0.00000 0.00000 -3.400000E-03 -8.410000E-02
-0.123500 0.301900 0.00000 0.00000 0.00000 0.00000 0.00000 1.200000E-03
-2.000000E-03 1.400000E-03 0.00000 0.00000 -3.200000E-03 0.00000 0.00000 0.00000
0.00000 0.00000 0.00000 0.00000 0.00000 -1.280000E-02 2.050000E-02
2 -3.300000E-03 6.500000E-03 4.040000E-02 0.00000 0.00000 0.00000 0.00000 -7.060000E-02
4.840000E-02 -0.112500 0.110300 0.00000 0.00000 0.00000 1.10330 0.00000
0.00000 0.00000 3.940000E-02 -8.500000E-03 -1.120000E-02 6.600000E-03 5.700000E-03 1.430000E-02
-1.300000E-03 -2.470000E-02 0.00000 3.700000E-03 2.200000E-03 5.000000E-04 4.300000E-03 4.500000E-03
-2.250000E-02 0.00000 0.00000 0.00000 0.00000 0.00000 -2.000000E-03 -5.840000E-02
-0.157300 0.292400 0.00000 0.00000 0.00000 0.00000 0.00000 -3.600000E-03
-1.700000E-03 1.200000E-03 0.00000 0.00000 -3.400000E-03 0.00000 0.00000 0.00000
0.00000 0.00000 0.00000 0.00000 0.00000 -7.400000E-03 1.180000E-02
3 -2.200000E-03 1.040000E-02 3.500000E-02 0.00000 0.00000 0.00000 0.00000 -4.390000E-02
0.00000 -3.170000E-02 2.590000E-02 0.00000 0.00000 0.00000 0.259400 0.00000
0.00000 0.00000 3.920000E-02 -1.030000E-02 -3.500000E-03 1.500000E-03 3.600000E-03 9.000000E-03
-9.000000E-04 -1.680000E-02 0.00000 4.700000E-03 2.000000E-03 3.000000E-04 2.700000E-03 2.800000E-03
-1.560000E-02 0.00000 0.00000 0.00000 0.00000 0.00000 -3.200000E-03 -7.920000E-02
-0.131600 0.302200 0.00000 0.00000 0.00000 0.00000 0.00000 3.000000E-04
-2.000000E-03 1.300000E-03 0.00000 0.00000 -3.300000E-03 0.00000 0.00000 0.00000
0.00000 0.00000 0.00000 0.00000 0.00000 -1.180000E-02 1.880000E-02字母代码(cmfa、cmfb等)是63个变量的名字。每个字母代码变量都与以下每个文本块位于相同位置的数字相关。
第一个数字块用于观察0,下一个数字块用于观察1,等等,用于3万多个观测。
我们希望找到一种方法将其转换为文本文件(最好是.csv)。在我的文本示例中,它将有63列和3行(标识符为+1)。每一栏应有适当的字母编号(cmfa等)。
如果可能的话,我们希望它能够运行在具有任意数量的列和任意数量的观察的文件上。
发布于 2016-08-02 21:35:45
使用简单python解析您提供的文件(独立于文件中的行数)的一种方法,可以使用正则表达式进行更好的实现,但我让您继续尝试:
#Importing required libraries
import numpy as np
import csv
#Open input file
with open('input.txt','rb') as f:
line = f.read().splitlines()
#Read file and do some parsing
line2 = []
for l in line:
z = l.split(" ")
l2 = []
for val in z:
if not(val==''):
l2.append(val)
if len(l2)==9:
line2.append(l2[1:9])
elif len(l2)==7 or len(l2)==8:
line2.append(l2)
#Remove unnecessary rows and do type conversion to float
pl = np.arange(0,len(line2)+1,8)
line3 = []
for i in np.arange(0,len(pl)-1):
z = line2[pl[i]:pl[i+1]]
z2 = [item for sublist in z for item in sublist]
if i==0:
line3.append(z2)
else:
line3.append([float(i) for i in z2])
#Write to output file
with open('output.csv','wb') as f:
wr = csv.writer(f)
for row in line3:
wr.writerow(row)如果希望保留索引,请执行以下操作:
#Importing required libraries
import numpy as np
import csv
#Open input file
with open('input.txt','rb') as f:
line = f.read().splitlines()
#Read file and do some parsing
line2 = []
for l in line:
z = l.split(" ")
l2 = []
for val in z:
if not(val==''):
l2.append(val)
if not(len(l2)==0):
line2.append(l2)
#Remove unnecessary rows and do type conversion to float
pl = np.arange(0,len(line2)+1,8)
line3 = []
for i in np.arange(0,len(pl)-1):
if i==0:
z = line2[pl[i]:pl[i+1]]
z2 = [item for sublist in z for item in sublist]
line3.append(['']+z2)
else:
z = line2[pl[i]:pl[i+1]]
z2 = [item for sublist in z for item in sublist]
line3.append([float(i) for i in z2])
#Write to output file
with open('output.csv','wb') as f:
wr = csv.writer(f)
for row in line3:
wr.writerow(row)发布于 2016-08-04 01:03:42
您可以使用mmap和regex解析文件,而不必将整个文件读入内存。
类似于:
import re
import mmap
import os
size=os.stat(fn_in).st_size
with open(fn_in, "r") as fin, open(fn_out, "w") as fout:
data = mmap.mmap(fin.fileno(), size, access=mmap.ACCESS_READ)
for idx, m in enumerate(re.finditer(r"(.*?)(?:(?:^\s*$)|\Z)", data, re.M | re.S)):
block=m.group(0).strip()
if not block:
continue
if idx==0:
fout.write("O_N,"+",".join(block.split())+"\n")
else:
fout.write(",".join(block.split())+"\n")https://stackoverflow.com/questions/38728942
复制相似问题