import numpy as np
import pandas as pd尝试使用pandas读取csv文件这是我抓取的数据。请注意,这里有开始和结束Maybe its a list的括号。我应该写什么才能让整个数据都以表格的形式出现?我不知道如何将括号与数据分开。
[]
['Auburn University (Online Master of Business Administration with concentration in Business Analytics)', ' Masters ', ' US', ' AL', ' /Campus ', ' Raymond J. Harbert College of Business ']
['Auburn University (Data Science)', ' Bachelors ', ' US', ' AL', ' /Campus ', ' Business ']
['The University of Alabama (Master of Science in Marketing, Specialization in Marketing Analytics)', ' Masters ', ' US', ' AL', ' Online/ ', ' Manderson Graduate School of Business ']
['The University of Alabama (MS in Operations Management - Decision Analytics Track)', ' Masters ', ' US', ' AL', ' /Campus ', ' Manderson Graduate School of Business ']
['The University of Alabama (M.S. degree in Applied Statistics, Data Mining Track)', ' Masters ', ' US', ' AL', ' /Campus ', ' Manderson Graduate School of Business ']
['The University of Alabama (MBA with concentration in Business Analytics)', ' Masters ', ' US', ' AL', ' Online/ ', ' Culverhouse College of Commerce ']
['Arkansas Tech University (Business Data Analytics)', ' Bachelors ', ' US', ' AR', ' /Campus ', ' Business ']
['University of Arkansas (Graduate Certificate in Business Analytics)', ' Certificate ', ' US', ' AR', ' Online/ ', ' Sam M. Walton College of Business ']
['University of Arkansas (Master of Information Systems with Business Analytics Concentration)', ' Masters ', ' US', ' AR', ' /Campus ', ' Sam M. Walton College of Business ']
['University of Arkansas (Professional Master of Information Systems)', ' Masters ', ' US', ' AR', ' /Campus ', ' Sam M. Walton College of 如何读取CSV文件?我希望所有的数据都以表格的形式出现。请帮帮忙
发布于 2019-05-05 08:52:38
您的问题就是错误消息告诉您的问题。解析此行时出错:
'The University of Alabama (营销理学硕士,营销分析专业) ',‘Master ',’US',‘AL',’Online/‘,’Manderson Graduate School of Business‘
该代码忽略引号字符,并将该行分解为多个字段,在找到分隔符",“的地方换行。您希望这是一个单独的字段:
阿拉巴马州大学(市场营销理学硕士,市场分析专业
但是这个“字段”中有一个分隔符",“的实例,CSV解析器将接受它,因为它忽略了将这个值放在引号中的事实。所以这段数据被分成两个字段:
[‘阿拉巴马州大学(市场营销理学硕士
和
营销分析专业化认证)‘
这会导致该行被分成7个字段,而您的代码只需要6个字段。
请注意,此外,您的项目将包括引号,这可能也不是您所期望的,并且这些方括号不属于那里。简而言之,这不是一个格式良好的CSV文件。
更新:我是一个正统的腊肠。我使用正则表达式做任何事情,不能忽略这样的挑战。下面是一个基于正则表达式的解决方案,它将准确地读取您想要从这些数据中获得的内容。如果您想让它识别数据的最后一行,您应该在该行的末尾添加"']“。
import regex
from pprint import pprint
def parse_file(file):
linepat = regex.compile(r"\[\s*('([^']*)')?(\s*,\s*'([^']*)')*\s*\]")
with open(file) as f:
r = []
while True:
line = f.readline()
if not line:
break
line = line.strip()
if len(line) == 0:
continue
m = linepat.match(line)
if m and m.captures(4):
fields = [m.group(2)] + [s.strip() for s in m.captures(4)]
r.append(fields)
return r
def main():
r = parse_file("/tmp/blah.csv")
pprint(r)
main()结果:
[['Auburn University (Online Master of Business Administration with '
'concentration in Business Analytics)',
'Masters',
'US',
'AL',
'/Campus',
'Raymond J. Harbert College of Business'],
...
['University of Arkansas (Professional Master of Information Systems)',
'Masters',
'US',
'AR',
'/Campus',
'Sam M. Walton College of']]请注意,这不使用内置的're‘模块。该模块不处理重复组,这对于此类问题是必须的。还要注意的是,这并不涉及熊猫。我对这个模块一无所知,我想把这段代码中干净的、解析过的数据输入到Pandas中是微不足道的,如果你真的想要它的话。
发布于 2019-05-05 09:19:01
读取file.csv的基本方法。
def process(string):
print("Processing:",string)
data = []
for line in open("file.csv"):
process(string)
line = line.replace("\n","")
process_code()https://stackoverflow.com/questions/55986684
复制相似问题