首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >以可变间距读取python中的文本文件

以可变间距读取python中的文本文件
EN

Stack Overflow用户
提问于 2017-05-05 18:06:43
回答 2查看 524关注 0票数 3

我有以下文本文件形式的数据,我想将其加载到python中:

代码语言:javascript
复制
      pclass  survived                                               name  
0          1         1                      Allen, Miss. Elisabeth Walton   
1          1         1                     Allison, Master. Hudson Trevor   
2          1         0                       Allison, Miss. Helen Loraine   
3          1         0               Allison, Mr. Hudson Joshua Creighton   
4          1         0    Allison, Mrs. Hudson J C (Bessie Waldo Daniels)   
5          1         1                                Anderson, Mr. Harry   
6          1         1                  Andrews, Miss. Kornelia Theodosia   
7          1         0                             Andrews, Mr. Thomas Jr   
8          1         1      Appleton, Mrs. Edward Dale (Charlotte Lamson)   
9          1         0                            Artagaveytia, Mr. Ramon   
10         1         0                             Astor, Col. John Jacob   

由于空白不是常量,而且由于最后一个字段(名称)在它们之间有一个空白,所以我很难解析它。我尝试了以下几点:

代码语言:javascript
复制
pd.read_csv("test.csv",sep = "\s+", header=0, index_col=0)

但它给出了一个错误:

代码语言:javascript
复制
CParserError: Error tokenizing data. C error: Expected 7 fields in line 5, saw 8
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2017-05-05 19:18:12

您可以使用pandas.read_fwf (又名:固定宽度格式)执行以下操作:

代码:

代码语言:javascript
复制
df = pd.read_fwf(StringIO(data), header=1, index_col=0)

测试代码:

代码语言:javascript
复制
from io import StringIO
import pandas as pd

data = u"""
      pclass  survived                                               name
0          1         1                      Allen, Miss. Elisabeth Walton
1          1         1                     Allison, Master. Hudson Trevor
2          1         0                       Allison, Miss. Helen Loraine
3          1         0               Allison, Mr. Hudson Joshua Creighton
4          1         0    Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
5          1         1                                Anderson, Mr. Harry
6          1         1                  Andrews, Miss. Kornelia Theodosia
7          1         0                             Andrews, Mr. Thomas Jr
8          1         1      Appleton, Mrs. Edward Dale (Charlotte Lamson)
9          1         0                            Artagaveytia, Mr. Ramon
10         1         0                             Astor, Col. John Jacob"""

df = pd.read_fwf(StringIO(data), header=1, index_col=0)
print(df)

结果:

代码语言:javascript
复制
    pclass  survived                                             name
0        1         1                    Allen, Miss. Elisabeth Walton
1        1         1                   Allison, Master. Hudson Trevor
2        1         0                     Allison, Miss. Helen Loraine
3        1         0             Allison, Mr. Hudson Joshua Creighton
4        1         0  Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
5        1         1                              Anderson, Mr. Harry
6        1         1                Andrews, Miss. Kornelia Theodosia
7        1         0                           Andrews, Mr. Thomas Jr
8        1         1    Appleton, Mrs. Edward Dale (Charlotte Lamson)
9        1         0                          Artagaveytia, Mr. Ramon
10       1         0                           Astor, Col. John Jacob
票数 2
EN

Stack Overflow用户

发布于 2017-05-05 18:08:25

'\s+'假设一个或多个空格,这些空格仍然解析您的最后一列。相反,请使用假定有两个或两个以上的正则表达式。

代码语言:javascript
复制
pd.read_csv("test.csv", sep="\s{2,}", header=0, index_col=0, engine='python')

整体工作示例

代码语言:javascript
复制
from io import StringIO
import pandas as pd

txt = """     pclass  survived                                               name  
0          1         1                      Allen, Miss. Elisabeth Walton   
1          1         1                     Allison, Master. Hudson Trevor   
2          1         0                       Allison, Miss. Helen Loraine   
3          1         0               Allison, Mr. Hudson Joshua Creighton   
4          1         0    Allison, Mrs. Hudson J C (Bessie Waldo Daniels)   
5          1         1                                Anderson, Mr. Harry   
6          1         1                  Andrews, Miss. Kornelia Theodosia   
7          1         0                             Andrews, Mr. Thomas Jr   
8          1         1      Appleton, Mrs. Edward Dale (Charlotte Lamson)   
9          1         0                            Artagaveytia, Mr. Ramon   
10         1         0                             Astor, Col. John Jacob   
"""

pd.read_csv(StringIO(txt), sep="\s{2,}", header=0, index_col=0, engine='python')

    pclass  survived                                             name
0        1         1                    Allen, Miss. Elisabeth Walton
1        1         1                   Allison, Master. Hudson Trevor
2        1         0                     Allison, Miss. Helen Loraine
3        1         0             Allison, Mr. Hudson Joshua Creighton
4        1         0  Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
5        1         1                              Anderson, Mr. Harry
6        1         1                Andrews, Miss. Kornelia Theodosia
7        1         0                           Andrews, Mr. Thomas Jr
8        1         1    Appleton, Mrs. Edward Dale (Charlotte Lamson)
9        1         0                          Artagaveytia, Mr. Ramon
10       1         0                           Astor, Col. John Jacob
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/43811290

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档