使用来自data.gov的一些随机的CSV数据,例如:“截至2011年1月,夏威夷退伍军人和受益人的墓地位置”,http://www.data.gov/raw/4608正在尝试用python解析CSV并处理每一行:
randomData = csv.DictReader(open('/downloads/ngl_hawaii.csv', 'rb'), delimiter=",")
for row in randomData:
print rowCSV样本数据如下:
d_first_name,d_mid_name,d_last_name,d_suffix,d_birth_date,d_death_date,section_id,row_num,site_num,cem_name,cem_addr_one,cem_addr_two,city,state,zip,cem_url,cem_phone,relationship,v_first_name,v_mid_name,v_last_name,v_mid_name,分支,军衔,战争 乔、"E“、"JoJo”、"“、"10/02/1920”、"03/12/2000“、"100-E”、“、"3”、“夏威夷退伍军人公墓”、"KAMEHAMEHA公路“、"JoJo”、"KANEOHE“、"HI”、"111444“、"”、"SXXXXX“、”退伍军人(Self)“、" Joe”、"E“、”JoJo“、"”、“美国军队”,“中士”,“二战”
结果并不太漂亮(打印一行):
{‘v_Joe_name’:None,‘cem_addr_2’:None,‘秩’:None,‘d_后缀’:None,'city':None,'row_num':None,'zip':None,'cem_phone':None,‘d_姓氏_name’:None,‘d_name_name’:'Joe,"E","JoJo","","10/02/1920",“2000年12月3日”、"100-E“、"”、"3“、”夏威夷退伍军人公墓“、"KAMEHAMEHA公路”、“、"KANEOHE”、"HI“、"11144 "”SXXXXX“、"”、“美国军队”、"SGT“、”第二次世界大战“、”无“、”v_mid_name“、”无“、”cem_url“:无、'cem_name':无、‘关系’:无,‘v_name’:None,'se one,'cem_addr_one':无,‘d_生日_date’:无,'d_death_date':None}
正如您所看到的,头字段(csv中的第一行)没有正确地关联到后面的每一行。
我是做错了什么,还是CSV质量差?
感谢Casey问我是否在另一个程序中打开了文件。Excel搞乱了文件.
发布于 2011-05-03 13:00:29
查看我下载的这里原始文件,它是有效的CSV。我误解了你剧本的输出。
由于使用csv.DictReader,每一行都被转换为字典,其中头值作为键,每个行的数据作为值。我在同一个文件上运行了它,看起来一切都是正确匹配的,尽管我没有完成整个过程。
根据python文档
class csv.DictReader(csvfile[, fieldnames=None[, restkey=None[, restval=None[, dialect='excel'[, *args, **kwds]]]]])
创建一个像普通读取器一样工作的对象,但是将读取的信息映射到一个由可选的字段名参数给出键的dict中。如果省略了字段名参数,则the文件第一行中的值将用作字段名。如果行读取的字段比字段名序列多,则其余数据将被添加为由restkey值键键决定的序列。如果行读取的字段少于字段名序列,则剩下的键接受可选的restval参数的值。任何其他可选或关键字参数都传递给基础读取器实例。
如果这不是您想要的格式,您可能需要尝试csv.reader,它只返回每一行的列表,而不是将其与标题相关联。
要使用上面的DictReader,您可能需要这样做:
import csv
reader = csv.DictReader(open('ngl_hawaii.csv', 'rb'), delimiter=','))
for row in reader:
print row['d_first_name']
print row['d_last_name']发布于 2011-05-03 13:32:34
奇怪,我从你那里得到不同的输出。
data.csv:
d_first_name,d_mid_name,d_last_name,d_suffix,d_birth_date,d_death_date,section_id,row_num,site_num,cem_name,cem_addr_one,cem_addr_two,city,state,zip,cem_url,cem_phone,relationship,v_first_name,v_mid_name,v_last_name,v_suffix,branch,rank,war "Emil","E","Seibel","","10/02/1920","03/12/2010","139-E","","3","HAWAII STATE VETERANS CEMETERY","KAMEHAMEHA HIGHWAY","","KANEOHE","HI","96744","","808-233-3630","Veteran (Self)","Emil","E","Seibel","","US ARMY","SGT","WORLD WAR II",
脚本:
for line in csv.DictReader(open('data.csv', 'rb'), delimiter=","):
print line输出:
{'v_last_name': 'Seibel', None: [''], 'cem_addr_two': '', 'rank': 'SGT', 'd_suffix': '', 'city': 'KANEOHE', 'row_num': '', 'zip': '96744', 'cem_phone': '808-233-3630', 'd_
last_name': 'Seibel', 'd_mid_name': 'E', 'state': 'HI', 'branch': 'US ARMY', 'd_first_name': 'Emil', 'war': 'WORLD WAR II', 'v_mid_name': 'E', 'cem_url': '', 'cem_name': '
HAWAII STATE VETERANS CEMETERY', 'relationship': 'Veteran (Self)', 'v_first_name': 'Emil', 'section_id': '139-E', 'v_suffix': '', 'site_num': '3', 'cem_addr_one': 'KAMEHAM
EHA HIGHWAY', 'd_birth_date': '10/02/1920', 'd_death_date': '03/12/2010'}csv.DictReader应该自动从文件中的第一行获取字段名,fieldnames参数被取消,如文档中所述。
输出中的None: ['']是由每一行数据上的后缀逗号引起的。
工作代码示例:
http://codepad.org/HdBhr4La
发布于 2011-05-03 13:10:44
刚刚尝试过,它可以很好地处理您的文件(重命名为foo)
import csv
ifile = open('foo.csv', "rb")
reader = csv.reader(ifile)
rownum = 0
for row in reader:
# Save header row.
if rownum == 0:
header = row
else:
colnum = 0
for col in row:
print '%-8s: %s' % (header[colnum], col)
colnum += 1
rownum += 1
ifile.close()OUTPUT=
d_first_name: Emil
d_mid_name: E
d_last_name: Seibel
d_suffix:
d_birth_date: 10/02/1920
d_death_date: 03/12/2010
section_id: 139-E
row_num :
site_num: 3
cem_name: HAWAII STATE VETERANS CEMETERY
cem_addr_one: KAMEHAMEHA HIGHWAY
cem_addr_two:
city : KANEOHE
state : HI
zip : 96744
cem_url :
cem_phone: 808-233-3630
relationship: Veteran (Self)
v_first_name: Emil
v_mid_name: E
v_last_name: Seibel
v_suffix:
branch : US ARMY
rank : SGT
war : WORLD WAR IIhttps://stackoverflow.com/questions/5869776
复制相似问题