我试图让它将每行2个抓取的项输出到第二个csv,但我似乎无法正确地格式化此行。output_urls中有许多抓取的URL,对于其中的每个URL,它应该生成一行输出以及另一个变量Urls。
self.file.writelines(["%s,%s\n" % (i, j) for (i, j) in zip([item['Urls']], ["\n".join(item['output_urls'])])])item['Urls']包含:
websiteinputitem['output_urls']包含:
website1
website2
website3
website4
website5我在试着得到:
websiteinput, website1
websiteinput, website2
websiteinput, website3
websiteinput, website4
websiteinput, website5它输出的是:
websiteinput, website1
website2
website3
website4
website5有什么建议吗?
发布于 2017-07-14 01:05:34
zip方法接受两个相同大小的iterables,并像这样连接它们:
>>> a = [1, 2, 3, 4, 5]
>>> b = [2, 2, 9, 0, 9]
>>> zip(a, b)
[
(1, 2),
(2, 2),
(3, 9),
(4, 0),
(5, 9)
]如果这就是你想要实现的目标
self.file.writelines(["%s, %s\n" % (i, j) for (i, j) in zip(item['Urls'],
item['output_urls']))])可能会给你你想要的
https://www.inputurl1.com, https://www.outputurl1.com
https://www.inputurl1.com, https://www.outputurl1.com删除join
这就是我测试你的问题的方法,我假设urls和output_urls都是数组,因此;
t = ["https://www.inputurl1.com", "https://www.outputurl2.com"]
k = ["https://www.inputurl3.com", "https://www.outputurl4.com"]
print(["%s, %s\n" % (i, j) for (i, j) in zip(t, k)])我得到的输出是
['https://www.inputurl1.com, https://www.inputurl3.com\n',
'https://www.outputurl2.com, https://www.outputurl4.com\n']发布于 2017-07-20 15:56:19
如果我没理解错的话,item看起来像这样:
>>> from pprint import pprint
>>> pprint(item)
{'Urls': ['websiteinput'],
'output_urls': ['website1', 'website2', 'website3', 'website4', 'website5']}您可以简单地使用列表理解,如下所示:
>>> [[u, o] for u in item['Urls'] for o in item['output_urls']]
[['websiteinput', 'website1'], ['websiteinput', 'website2'], ['websiteinput', 'website3'], ['websiteinput', 'website4'], ['websiteinput', 'website5']]
>>> pprint(_)
[['websiteinput', 'website1'],
['websiteinput', 'website2'],
['websiteinput', 'website3'],
['websiteinput', 'website4'],
['websiteinput', 'website5']]使用.writelines(),您可以执行以下操作:
self.file.writelines("%s,%s\n" % (u, o)
for u in item['Urls']
for o in item['output_urls'])https://stackoverflow.com/questions/45086252
复制相似问题