我的数据文件如下所示:
data.txt
user,activity,timestamp,x-axis,y-axis,z-axis
0,33,Jogging,49105962326000,-0.6946376999999999,12.680544,0.50395286;
1,33,Jogging,49106062271000,5.012288,11.264028,0.95342433;
2,33,Jogging,49106112167000,4.903325,10.882658000000001,-0.08172209;
3,33,Jogging,49106222305000,-0.61291564,18.496431,3.0237172;可以看到,最后一列以分号结尾,所以当我读到熊猫时,该列被推断为类型对象(以分号结尾)。
df = pd.read_csv('data.txt')
df
user activity timestamp x-axis y-axis z-axis
0 33 Jogging 49105962326000 -0.694638 12.680544 0.50395286;
1 33 Jogging 49106062271000 5.012288 11.264028 0.95342433;
2 33 Jogging 49106112167000 4.903325 10.882658 -0.08172209;
3 33 Jogging 49106222305000 -0.612916 18.496431 3.0237172;我怎么能让熊猫忽视这个分号?
发布于 2020-10-30 21:59:11
txt的问题在于它的内容是混合的。如我所见,标题没有分号作为终止字符
如果您更改第一行,添加分号,这是非常简单的
pd.read_csv("data.txt", lineterminator=";")发布于 2020-10-30 22:02:10
情况可能不是这样,但是给出了一个例子,它可以工作。
在文档中,您可以找到comment param:
指示不应解析行的其余部分。如果在行的开头找到该行,则将完全忽略该行。此参数必须是单个字符。与空行一样(只要skip_blank_lines=True),参数头忽略完全注释的行,而不忽略跳过。例如,如果注释=‘#’,使用header=0解析#空\na,b,c\n1,2,3将导致‘a,b,c’作为标题处理。
因此,如果只能在上一篇专栏文章的末尾找到;:
>>> df = pd.read_csv("data.txt", comment=";")
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 user 4 non-null int64
1 activity 4 non-null object
2 timestamp 4 non-null int64
3 x-axis 4 non-null float64
4 y-axis 4 non-null float64
5 z-axis 4 non-null float64
dtypes: float64(3), int64(2), object(1)
memory usage: 224.0+ bytes
>>> df
user activity timestamp x-axis y-axis z-axis
0 33 Jogging 49105962326000 -0.694638 12.680544 0.503953
1 33 Jogging 49106062271000 5.012288 11.264028 0.953424
2 33 Jogging 49106112167000 4.903325 10.882658 -0.081722
3 33 Jogging 49106222305000 -0.612916 18.496431 3.023717发布于 2020-10-30 21:55:07
您可以使用converters param:
;df = pd.read_csv('data.txt', sep=",", converters={"z-axis": lambda x: float(x.replace(";",""))})
print(df)
data txtuser activity timestamp x-axis y-axis z-axis
0 0 33 Jogging 49105962326000 -0.694638 12.680544 0.503953
1 1 33 Jogging 49106062271000 5.012288 11.264028 0.953424
2 2 33 Jogging 49106112167000 4.903325 10.882658 -0.081722
3 3 33 Jogging 49106222305000 -0.612916 18.496431 3.023717https://stackoverflow.com/questions/64616163
复制相似问题