我的程序需要一个函数来从csv文件("all.csv")中读取数据,并提取所有与'Virginia‘相关的数据(提取其中包含'Virginia’的每一行),然后将提取的数据写到另一个名为"Virginia.csv“的csv文件中。程序运行时没有错误;然而,当我打开"Virginia.csv”文件时,它是空白的。
以下是all.csv文件中的数据:
https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv
下面是我的代码:
import csv
input_file = 'all.csv'
output_file = 'Virginia.csv'
state = 'Virginia'
mylist = []
def extract_records_for_state (input_file, output_file, state):
with open(input_file, 'r') as infile:
contents = infile.readlines()
with open(output_file, 'w') as outfile:
writer = csv.writer(outfile)
for row in range(len(contents)):
contents[row] = contents[row].split(',') #split elements
for row in range(len(contents)):
for word in range(len(contents[row])):
if contents[row][2] == state:
writer.writerow(row)
extract_records_for_state(input_file,output_file,state)发布于 2021-11-11 17:05:58
我运行了你的代码,它给了我一个错误
回溯(最近一次调用):文件"c:\Users\Dolimight\Desktop\Stack Overflow\Geraldo\main.py",第27行,在extract_records_for_state(input_file,output_file,state)文件"c:\Users\Dolimight\Desktop\Stack Overflow\Geraldo\main.py",第24行,在extract_records_for_state writer.writerow(行) _csv.Error: iterable expected,not int,
我修复了这个错误,将row [contents[row]]的内容放入the writerow()函数,并再次运行它,数据显示在Virginia.csv中。它给了我重复的东西,所以我也去掉了for-loop这个词。
import csv
input_file = 'all.csv'
output_file = 'Virginia.csv'
state = 'Virginia'
mylist = []
def extract_records_for_state(input_file, output_file, state):
with open(input_file, 'r') as infile:
contents = infile.readlines()
with open(output_file, 'w') as outfile:
writer = csv.writer(outfile)
for row in range(len(contents)):
contents[row] = contents[row].split(',') # split elements
print(contents)
for row in range(len(contents)):
if contents[row][2] == state:
writer.writerow(contents[row]) # this is what I changed
extract_records_for_state(input_file, output_file, state)发布于 2021-11-11 17:14:14
您有两个错误。第一种方法是尝试在writer.writerow(row)处写入行索引-该行为contents[row]。第二种方法是在读取时将换行符留在最后一列,但不要在写入时剥离它。相反,您可以更充分地利用csv模块。让阅读器解析这些行。而不是读取使用大量内存的列表,而是逐行过滤和写入。
import csv
input_file = 'all.csv'
output_file = 'Virginia.csv'
state = 'Virginia'
mylist = []
def extract_records_for_state (input_file, output_file, state):
with open(input_file, 'r', newline='') as infile, \
open(output_file, 'w', newline="") as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
# add header
writer.writerow(next(reader))
# filter for state
writer.writerows(row for row in reader if row[2] == state)
extract_records_for_state(input_file,output_file,state)发布于 2021-11-11 20:38:47
看着你的代码,我突然想到了两件事:
contents[row] = contents[row].split(',')).我推荐两件事:
将逻辑分成不同的块:所有嵌套可能很难解释和调试;做一件事,证明它是有效的;做另一件事,证明它是有效的;将CSV API使用到其最充分的:使用它来读取和编写您的CSV
我不想尝试复制/修复你的代码,而是提供这个通用的方法来实现这两个目标:
import csv
# Read in
all_rows = []
with open('all.csv', 'r', newline='') as f:
reader = csv.reader(f)
next(reader) # discard header (I didn't see you keep it)
for row in reader:
all_rows.append(row)
# Process
filtered_rows = []
for row in all_rows:
if row[2] == 'Virginia':
filtered_rows.append(row)
# Write out
with open('filtered.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(filtered_rows)一旦您理解了这些离散步骤的逻辑和API,您就可以继续(前进)编写一些更复杂的东西,比如下面的代码,它读取一行,决定是否应该写入它,如果应该,则写入它:
import csv
with open('filtered.csv', 'w', newline='') as f_out:
writer = csv.writer(f_out)
with open('all.csv', 'r', newline='') as f_in:
reader = csv.reader(f_in)
next(reader) # discard header
for row in reader:
if row[2] == 'Virginia':
writer.writerow(row)在这个(真正缩小的) all.csv示例上使用这两段代码中的任何一段
date,county,state,fips,cases,deaths
2020-03-09,Fairfax,Virginia,51059,4,0
2020-03-09,Virginia Beach city,Virginia,51810,1,0
2020-03-09,Chelan,Washington,53007,1,1
2020-03-09,Clark,Washington,53011,1,0给我一个看起来像这样的filtered.csv:
2020-03-09,Fairfax,Virginia,51059,4,0
2020-03-09,Virginia Beach city,Virginia,51810,1,0考虑到这个数据集的大小,第二种在读循环中按需写入的方法既更快(在我的机器上大约快5倍),而且使用的内存也明显更少(在我的机器上大约少40倍),因为没有使用all_rows的中间存储。
但是,请花时间运行它们,仔细阅读它们,看看它们是如何工作的。
https://stackoverflow.com/questions/69932065
复制相似问题