首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >有效地整理多个相似的数据

有效地整理多个相似的数据
EN

Stack Overflow用户
提问于 2016-07-28 18:05:46
回答 1查看 38关注 0票数 0

此处显示的数据文件是从仪器导出的测量记录。

我上传了here,任何感兴趣的人都可以下载。

背景

代码语言:javascript
复制
Sample
RECORD-1
FID1, FID2, front_temperature, laser, laserlow, pressure, mode
-925    284 1452    315 143 16653   He  -28500
-924    281 1462    322 136 16641   He  -28628
-920    281 1455    311 139 16649   He  -28756
-923    279 1454    312 139 16636   He  -28884
......

Sample
RECORD-2
FID1, FID2, front_temperature, laser, laserlow, pressure, mode
-925    284 1452    315 143 16653   He  -28500
......
......  

通常,对于不同的样本,按照测试例程的顺序有几条记录。这些样本的数据记录都是相同的格式。

我的尝试

如果数据文件( *.txt格式)中只有一个样本,我可以将数据文件排列到pandas中。Dataframe,这样我就可以在Python中处理更多的数据分析过程。

我的代码如下所示:

代码语言:javascript
复制
# Whole datafile with several samples record inside
with open("record.txt") as f:
     mylist = f.read().splitlines() 

## The record for each sample length in 803 lines
lines = mylist[0:803]

### The sample_name was extract from the third line
sample_name = lines[2]

### For each sample, the measure record was saved in several aspects, 
### which were regarded as some columns here
columns  = lines[22].split()

### Generate an empty columns for saving data record later.
df  = {columns[0][:-1]:[],columns[1][:-1]:[],columns[2][:-1]:[],columns[3][:-1]:[],columns[4][:-1]:[],
  columns[5][:-1]:[],columns[6][:-1]:[],} #### I only though about this dumb method for now

## Data extracting
### the valid data record of sample 1 was from line 23
for i in range(0, len(lines[23:]),1):
    for j in range(0, len(columns),1):
        df[columns[j][:-1]].append(lines[23+i].split()[j])
pd.DataFrame(df)  

结果如下所示:

我的目标

从上面的代码中,我可以处理一个样本的数据文件。但是当在记录文本中表示了几个样本时。我找不到有效处理它的线索。

这是我的目标的一个插图。生成用于保存所有样本记录的数据帧字典。

任何建议都将不胜感激!

EN

回答 1

Stack Overflow用户

发布于 2016-07-28 22:53:12

我想你要找的东西是这样的:

代码语言:javascript
复制
import pandas as pd
# Whole datafile with several samples record inside
with open("record.txt",'r') as f:
     mylist = f.read().splitlines() 

dataset = []
while True:

    try:
        ## The record for each sample length in 803 lines
        lines, mylist = mylist[0:803], mylist[803:] #this split your list!!
        ### The sample_name was extract from the third line
        sample_name = lines[2]



        ### For each sample, the measure record was saved in several aspects, 
        ### which were regarded as some columns here
        columns  = lines[22].split()

        ### Generate an empty columns for saving data record later.
        df  = {columns[0][:-1]:[],columns[1][:-1]:[],columns[2][:-1]:[],columns[3][:-1]:[],columns[4][:-1]:[],
               columns[5][:-1]:[],columns[6][:-1]:[],} #### I only though about this dumb method for now

        ## Data extracting
        ### the valid data record of sample 1 was from line 23
        for i in range(0, len(lines[23:]),1):
            for j in range(0, len(columns),1):
                df[columns[j][:-1]].append(lines[23+i].split()[j])

    except IndexError:
        break

    df = pd.DataFrame(df)
    dataset.append(df)

现在dataset[0]应该包含示例1的df。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/38633246

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档