文章/答案/技术大牛

发布

社区首页 >问答首页 >Beautiful一次又一次返回相同的输出

问Beautiful一次又一次返回相同的输出
EN

Stack Overflow用户

提问于 2020-04-07 08:02:15

回答 1查看 118关注 0票数 1

我是网络抓取的新手。我希望抓取器返回带有关键字"neuro“的所有段落，但是当我运行代码时，它似乎对所有迭代都返回相同的输出。你能给我指出我的错误吗？

import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re 

from time import sleep
from random import randint

url = "https://www.findamasters.com/masters-degrees/united-kingdom/?40w900"
results = requests.get(url)
info =[]  
page_number = np.arange(1,1219)
soup = BeautifulSoup(results.text, "html.parser")

for page in page_number:
    page = requests.get("https://www.findamasters.com/masters-degrees/united-kingdom/?40w900&PG=" + str(page))
    div = soup.find("p", string =re.compile('neuro'))

sleep(randint(2,10))

masters = pd.DataFrame({
    'info': div})
masters.to_csv('masters.csv')

但我得到的唯一输出是：

<p>It’s our mission to prolong and improve the lives of patients, and we seek to do this by conducting world-leading research in areas such as neuroscience, oncology, infectious diseases and more.</p>
<p>It’s our mission to prolong and improve the lives of patients, and we seek to do this by conducting world-leading research in areas such as neuroscience, oncology, infectious diseases and more.</p>
....

python

web

web-scraping

beautifulsoup

回答 1

Stack Overflow用户

发布于 2020-04-07 08:10:10

这就是你的问题。BeautifulSoup parase results.text和结果来自固定的url "https://www.findamasters.com/masters-degrees/united-kingdom/?40w900“。

因此，请按如下方式更改代码。

import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re 

from time import sleep
from random import randint

url = "https://www.findamasters.com/masters-degrees/united-kingdom/?40w900"
results = requests.get(url)
info =[]  
page_number = np.arange(1,1219)
soup = BeautifulSoup(results.text, "html.parser")

for page in page_number:
    page = requests.get("https://www.findamasters.com/masters-degrees/united-kingdom/?40w900&PG=" + str(page))
    results = requests.get(page)
    soup = BeautifulSoup(results.text, "html.parser")
    div = soup.find("p", string =re.compile('neuro'))

sleep(randint(2,10))

masters = pd.DataFrame({
    'info': div})
masters.to_csv('masters.csv')

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/61070770

复制

相似问题

问Beautiful一次又一次返回相同的输出
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Beautiful一次又一次返回相同的输出EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Beautiful一次又一次返回相同的输出
EN