我有一个csv文件,其数据如下:
"field1"|"field2"|"field3"
"12ed"|"ksdk"|"sjdhs"
"1323"|"jdjsk
sjfsk"|"sk"k"sd"我的预期产出
field1|field2|field3
12ed|ksdk|sjdhs
1323|jsjsk sjfsk|sk"k"sd我的两个问题在第3行。其中的数据包含双引号在双引号csv文件中,它应该在最后的输出中返回。和列的值中的新行/行中断。都在第3行找到。
由于我将数据读取为"QUOTE_NONE",所以可以返回1:-1数据,但不能用空值替换新行。
with open(fileIn, "rb") as input:
with open(fileOut,'wb') as output:
w = csv.writer(output, delimiter='|',quoting=csv.QUOTE_NONE,quotechar='')
for record in csv.reader(input, delimiter='|',quoting=csv.QUOTE_NONE):
#r = map(lambda x: x.replace("\n",""), record) --> This is not working
print([s[1:-1] for s in record])
w.writerow([s[1:-1] for s in record])使用这段代码,我能够处理引号(第一和最后)并将引号保存在数据中。但我处理不了新线路。
更新-
csv文件内容:-
"id"|"comments"|"Date"
"B-7"|"Hi How .
Are You."|"2017-03-15 13:53:23.727"
"8-C"|"How was "your day" today"|"2017-02-06 11:45:26.783"错误:-
['"id"', '"comments"', '"Date"']
['"B-7"', '"Hi How . ']
[]
Traceback (most recent call last):
File "try.py", line 23, in <module>
appendRecords(record, oldRecord)
File "try.py", line 8, in appendRecords
oldRecord[-1] = oldRecord[-1] + ' ' + record[0]
IndexError: list index out of rangeFYI -我使用2.6.6版本
发布于 2017-05-09 04:16:41
一个选项是添加一个检查,如果行的最后一列没有以"结尾,那么就不要将其写入输出文件,而是将下一行合并到输出文件中,然后将其写入输出文件。
Merge是一个list.extend,只是第一个列表的最后一个元素和最后一个列表的第一个元素也被连接起来。
此代码应该适用于您:
def appendRecords(record, oldRecord):
# Check to guard against empty lines in the input csv file
if len(record):
oldRecord[-1] = oldRecord[-1] + ' ' + record[0]
record.pop(0)
oldRecord.extend(record)
with open(fileIn, "rb") as input:
with open(fileOut,'wb') as output:
w = csv.writer(output, delimiter='|',quoting=csv.QUOTE_NONE,quotechar='')
oldRecord = None
for record in csv.reader(input, delimiter='|',quoting=csv.QUOTE_NONE):
if oldRecord is not None:
appendRecords(record, oldRecord)
record = oldRecord
if record[-1].endswith('"'):
print([s[1:-1] for s in record])
w.writerow([s[1:-1] for s in record])
oldRecord = None
else:
oldRecord = recordhttps://stackoverflow.com/questions/43860482
复制相似问题