尝试通过使用bs4来学习网络抓取,并且在获取一周中的日期方面遇到了一些小麻烦。
这就是我目前的情况:
import requests
from bs4 import BeautifulSoup
import pandas as pd
page = requests.get('https://weather.gc.ca/city/pages/bc-74_metric_e.html')
html = BeautifulSoup(page.content, 'html.parser')
forecast = html.find(class_="visible-xs mrgn-tp-md")
print(forecast.find_all("strong"))我想获得以下产出:
Tonight
Wed
Thu
Fri
Sat
Sun
Mon发布于 2021-10-06 07:06:29
这里(在python 3中测试):
counter = 0
for i in forecast.find_all("strong"):
if counter == 0:
print(i.text)
elif i.find("abbr"):
print(i.text.split(',')[0])
counter=1或者,如果您想摆脱分支:
strong_contents = forecast.find_all("strong")
values = []
# first element is Tonight without "abbr"
values.append(strong_contents[0].text)
# Use list slicing to get the rest of the elements and filter by "abbr"
for i in strong_contents[1:]:
if i.find("abbr"):
# i.text gives "Wed, 6 Oct", so we split by `,`
# and print first element
values.append(i.text.split(',')[0])
print('\n'.join(values))发布于 2021-10-06 06:44:48
你可以试试这个方法
strong_tag = forecast.find_all("strong")
days = [strong_tag[0].contents[0]]
for i in strong_tag:
if i.find("abbr"):
days.append(i.getText())如果我们打印days列表,输出如下:
print("\n".join(days))产出:
Tonight
Wed, 6 Oct
Thu, 7 Oct
Fri, 8 Oct
Sat, 9 Oct
Sun, 10 Oct
Mon, 11 Oct如果您只需要日期的名称,则可以使用
days.append(i.getText().split(",")[0])而不是days.append(i.getText()).
。
发布于 2021-10-06 06:46:12
我是个初学者,我试着想出你的预期结果。如果我做错了什么,请纠正我。您可以尝试下面的代码来获得输出。我希望这对你有帮助。
import requests
from bs4 import BeautifulSoup
import pandas as pd
page = requests.get('https://weather.gc.ca/city/pages/bc-74_metric_e.html')
html = BeautifulSoup(page.content, 'html.parser')
forecast = html.find(class_="visible-xs mrgn-tp-md")
data = forecast.find_all("strong")
store = []
for i in data:
store.append(i.get_text())
# print(store)
keyword = ['Tonight', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun', 'Mon', 'Tue']
for i in store:
for j in keyword:
if str(j) in str(i):
print(i.split(',',1)[0])输出:
Tonight
Wed
Thu
Fri
Sat
Sun
Mon要获得以下输出:
Tonight
Wed, 6 Oct
Thu, 7 Oct
Fri, 8 Oct
Sat, 9 Oct
Sun, 10 Oct
Mon, 11 Oct你需要改变一点print(i)而不是print(i.split(',',1)[0])
https://stackoverflow.com/questions/69460424
复制相似问题