我想要抓取6天内的日期和相关的新闻标题/文章-就像今天运行python脚本时,它应该抓取从今天(8月10日)到8月4日的标题/文章。我可以从here抓取所有日期的日期和头条新闻/urls。下面是相同的代码
websites = ['https://www.thespiritsbusiness.com/tag/rum/']
for spirits in websites:
browser.get(spirits)
time.sleep(1)
news_links = browser.find_elements_by_xpath('//*[@id="archivewrapper"]/div/div[2]/h3')
n_links = [ele.find_element_by_tag_name('a').get_attribute('href') for ele in news_links]
dates = browser.find_elements_by_xpath('//*[@id="archivewrapper"]/div/div[2]/small')
n_dates = [ele.text for ele in dates]
print(n_links)
print(n_dates)但是,从今天开始的最后6天里,我该怎么做呢?有什么想法吗?
发布于 2021-08-10 07:15:28
请参阅第2页的url
https://www.thespiritsbusiness.com/tag/rum/page/2/这基本上意味着,对于下一次迭代,您需要在URL中添加/page/2/。
您可以将网站列表设置为:
websites = ['https://www.thespiritsbusiness.com/tag/rum/', 'https://www.thespiritsbusiness.com/tag/rum/page/2/', 'https://www.thespiritsbusiness.com/tag/rum/page/3/']以此类推,来实现这一点。
或者,您也可以通过编程来完成此操作:
page_number = 1
websites = ['https://www.thespiritsbusiness.com/tag/rum/']
for spirits in websites:
browser.get(spirits + f"page/{page_number}/")
page_number = page_number + 1https://stackoverflow.com/questions/68722400
复制相似问题