我是网络抓取的新手。我希望抓取器返回带有关键字"neuro“的所有段落,但是当我运行代码时,它似乎对所有迭代都返回相同的输出。你能给我指出我的错误吗?
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re
from time import sleep
from random import randint
url = "https://www.findamasters.com/masters-degrees/united-kingdom/?40w900"
results = requests.get(url)
info =[]
page_number = np.arange(1,1219)
soup = BeautifulSoup(results.text, "html.parser")
for page in page_number:
page = requests.get("https://www.findamasters.com/masters-degrees/united-kingdom/?40w900&PG=" + str(page))
div = soup.find("p", string =re.compile('neuro'))
sleep(randint(2,10))
masters = pd.DataFrame({
'info': div})
masters.to_csv('masters.csv')但我得到的唯一输出是:
<p>It’s our mission to prolong and improve the lives of patients, and we seek to do this by conducting world-leading research in areas such as neuroscience, oncology, infectious diseases and more.</p>
<p>It’s our mission to prolong and improve the lives of patients, and we seek to do this by conducting world-leading research in areas such as neuroscience, oncology, infectious diseases and more.</p>
....发布于 2020-04-07 08:10:10
这就是你的问题。BeautifulSoup parase results.text和结果来自固定的url "https://www.findamasters.com/masters-degrees/united-kingdom/?40w900“。
因此,请按如下方式更改代码。
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re
from time import sleep
from random import randint
url = "https://www.findamasters.com/masters-degrees/united-kingdom/?40w900"
results = requests.get(url)
info =[]
page_number = np.arange(1,1219)
soup = BeautifulSoup(results.text, "html.parser")
for page in page_number:
page = requests.get("https://www.findamasters.com/masters-degrees/united-kingdom/?40w900&PG=" + str(page))
results = requests.get(page)
soup = BeautifulSoup(results.text, "html.parser")
div = soup.find("p", string =re.compile('neuro'))
sleep(randint(2,10))
masters = pd.DataFrame({
'info': div})
masters.to_csv('masters.csv')https://stackoverflow.com/questions/61070770
复制相似问题