数据:
from io import StringIO
import pandas as pd
s = '''ID,Level,QID,Text,ResponseID,responseText,date_key
375280046,S,D3M,Which is your favorite?,D5M0,option 1,2012-08-08 00:00:00
375280046,S,D3M,How often? (at home, at work, other),D3M0,Work,2010-03-31 00:00:00
375280046,M,A78,Do you prefer a, b, or c?,A78C,a,2010-03-31 00:00:00'''
df = pd.read_csv(StringIO(s))收到的错误:
pandas.io.common.CParserError: Error tokenizing data. C error: Expected 7 fields in line 3, saw 9这是非常明显的原因,我收到这个错误。数据包含文本,如How often? (at home, at work, other)和Do you prefer a, b, or c?。
如何将这类数据读入熊猫DataFrame?
发布于 2017-06-27 17:28:16
当然,当我写这个问题的时候,我想明白了。与其删除它,我将与我未来的自我分享,当我忘记如何做这件事。
显然,熊猫默认的sep=','也可以是一个正则表达式。
解决方案是将sep=r',(?!\s)'添加到read_csv中,如下所示:
df = pd.read_csv(StringIO(s), sep=r',(?!\s)')(?!\s)部分是一个负前瞻,只匹配后面没有后续空格的逗号。
结果:
ID Level QID Text ResponseID \
0 375280046 S D3M Which is your favorite? D5M0
1 375280046 S D3M How often? (at home, at work, other) D3M0
2 375280046 M A78 Do you prefer a, b, or c? A78C
responseText date_key
0 option 1 2012-08-08 00:00:00
1 Work 2010-03-31 00:00:00
2 a 2010-03-31 00:00:00 https://stackoverflow.com/questions/44786415
复制相似问题