我希望从一家技术/分析研究公司收集一些公开可用的数据。
到目前为止,我可以打印出标题和位置,但是text.strip()函数并没有真正起作用--我可能遗漏了一些明显的东西。
import requests
from bs4 import BeautifulSoup
from requests.api import head
# get the data
data = requests.get("https://www.forrester.com/bio/michele-goetz?id=BIO5224")
# load data into bs4
soup = BeautifulSoup(data.text, "html.parser")
analyst_data = soup.find("div", { "class": "col-md-9" })
#print(analyst_data)
header_title = analyst_data.find("h1")
header_paragraph = analyst_data.find("p")
print(header_title,header_paragraph)
for data in header_title.find_all(), header_paragraph.find_all():
name = data.find_all("h1")[0].text.strip()
position = data.find_all("p")[1].text.strip()
print(name , position)发布于 2021-07-11 23:23:11
您已经在执行以下操作时找到了标签:
header_title = analyst_data.find("h1")
header_paragraph = analyst_data.find("p")因此,没有必要创建这个for循环:
for data in header_title.find_all(), header_paragraph.find_all():
name = data.find_all("h1")[0].text.strip()
position = data.find_all("p")[1].text.strip()
print(name , position)相反,在header_title和header_paragraph上调用.text。在您的示例中:
import requests
from bs4 import BeautifulSoup
from requests.api import head
# get the data
data = requests.get("https://www.forrester.com/bio/michele-goetz?id=BIO5224")
# load data into bs4
soup = BeautifulSoup(data.text, "html.parser")
analyst_data = soup.find("div", { "class": "col-md-9" })
#print(analyst_data)
header_title = analyst_data.find("h1")
header_paragraph = analyst_data.find("p")
print(header_title.text.strip(), header_paragraph.text.strip())输出:
Michele Goetz VP, Principal Analysthttps://stackoverflow.com/questions/68337237
复制相似问题