所以我最近在网上找到了这段代码,它是用python编写的,在pandas中使用了enumerate表达式。
import pandas as pd
url = 'http://myurl.com/mypage/'
for i, df in enumerate(pd.read_html(url)):
df.to_csv('myfile_%s.csv' % i)有没有办法重写它,这样它就可以遍历网页列表而不是单个url,并将每个页面表格中的所有信息放入一个.csv文件中?我的主要猜测类似于for循环。
url_base = 'http://myurl.com/mypage/'
count = 1
for i in range(1,5):
url = '%s%s' %(url_base,count)
for i, df in enumerate(pd.read_html(url)):
df.to_csv('myfile_%s.csv' % i)
count = count + 1 发布于 2017-09-20 08:16:53
如果您的所有csvs都具有相同的列,则可以执行以下操作
pd.concat([pd.read_html(url) for url in urls], ignore_index=True)如果您的urls与示例中的urls具有相同的基数,您将执行以下操作
url_base = 'http://myurl.com/mypage/{}'
df = pd.concat([pd.read_html(base.format(i)) for i in range(num)], ignore_index=True)
df.to_csv('alldata.csv')发布于 2017-09-20 09:19:38
这个怎么样?
import pandas as pd
from concurrent import futures
urls = [your list of urls]
def read_html(url):
return pd.read_html(url)
with futures.ThreadPoolExecutor(max_workers=6) as executor:
fetched_urls = dict((executor.submit(read_html, url), url)
for url in urls)
nums = range(1, len(fetched_url)+1)
for future, num in zip(futures.as_completed(fetched_urls), nums):
if future.result():
future.result().to_csv('myfile_{}.csv'.format(num), index=False)
elif future.exception():
print '{} yielded no results'.format(fetched_urls[future])
else:
passhttps://stackoverflow.com/questions/46311383
复制相似问题