此处显示的数据文件是从仪器导出的测量记录。
我上传了here,任何感兴趣的人都可以下载。
背景
Sample
RECORD-1
FID1, FID2, front_temperature, laser, laserlow, pressure, mode
-925 284 1452 315 143 16653 He -28500
-924 281 1462 322 136 16641 He -28628
-920 281 1455 311 139 16649 He -28756
-923 279 1454 312 139 16636 He -28884
......
Sample
RECORD-2
FID1, FID2, front_temperature, laser, laserlow, pressure, mode
-925 284 1452 315 143 16653 He -28500
......
...... 通常,对于不同的样本,按照测试例程的顺序有几条记录。这些样本的数据记录都是相同的格式。
我的尝试
如果数据文件( *.txt格式)中只有一个样本,我可以将数据文件排列到pandas中。Dataframe,这样我就可以在Python中处理更多的数据分析过程。
我的代码如下所示:
# Whole datafile with several samples record inside
with open("record.txt") as f:
mylist = f.read().splitlines()
## The record for each sample length in 803 lines
lines = mylist[0:803]
### The sample_name was extract from the third line
sample_name = lines[2]
### For each sample, the measure record was saved in several aspects,
### which were regarded as some columns here
columns = lines[22].split()
### Generate an empty columns for saving data record later.
df = {columns[0][:-1]:[],columns[1][:-1]:[],columns[2][:-1]:[],columns[3][:-1]:[],columns[4][:-1]:[],
columns[5][:-1]:[],columns[6][:-1]:[],} #### I only though about this dumb method for now
## Data extracting
### the valid data record of sample 1 was from line 23
for i in range(0, len(lines[23:]),1):
for j in range(0, len(columns),1):
df[columns[j][:-1]].append(lines[23+i].split()[j])
pd.DataFrame(df) 结果如下所示:

我的目标
从上面的代码中,我可以处理一个样本的数据文件。但是当在记录文本中表示了几个样本时。我找不到有效处理它的线索。
这是我的目标的一个插图。生成用于保存所有样本记录的数据帧字典。

任何建议都将不胜感激!
发布于 2016-07-28 22:53:12
我想你要找的东西是这样的:
import pandas as pd
# Whole datafile with several samples record inside
with open("record.txt",'r') as f:
mylist = f.read().splitlines()
dataset = []
while True:
try:
## The record for each sample length in 803 lines
lines, mylist = mylist[0:803], mylist[803:] #this split your list!!
### The sample_name was extract from the third line
sample_name = lines[2]
### For each sample, the measure record was saved in several aspects,
### which were regarded as some columns here
columns = lines[22].split()
### Generate an empty columns for saving data record later.
df = {columns[0][:-1]:[],columns[1][:-1]:[],columns[2][:-1]:[],columns[3][:-1]:[],columns[4][:-1]:[],
columns[5][:-1]:[],columns[6][:-1]:[],} #### I only though about this dumb method for now
## Data extracting
### the valid data record of sample 1 was from line 23
for i in range(0, len(lines[23:]),1):
for j in range(0, len(columns),1):
df[columns[j][:-1]].append(lines[23+i].split()[j])
except IndexError:
break
df = pd.DataFrame(df)
dataset.append(df)现在dataset[0]应该包含示例1的df。
https://stackoverflow.com/questions/38633246
复制相似问题