我正在尝试使用漂亮的汤和Python/Pandas从维基百科页面中提取所有“感兴趣的地方”,并将它们放入数据格式。例如:
url_Paris_01 = requests.get('https://en.wikipedia.org/wiki/1st_arrondissement_of_Paris').text
soup_Paris_01 = BeautifulSoup(url_Paris_01, "html.parser")
for headline in soup_Paris_01.find_all("span", {"class": "mw-headline"}):
print(headline.text)Geography
Demography
Historical population
Immigration
Quarters
Economy
Education
Map
Cityscape
**Places of interest**
Bridges
Streets and squares
See also
References
External links不工作
soup_Paris_01.find_all('li',attrs={"id":"Places_of_interest"}) 我看到我的“名胜古迹”都有标题标签。
名胜古迹
发布于 2019-11-10 20:41:26
首先在ul span标记下查找place of interest项,然后对ul项下的所有锚标记执行find_all()。
from bs4 import BeautifulSoup
import requests
url_Paris_01 = requests.get('https://en.wikipedia.org/wiki/1st_arrondissement_of_Paris').text
soup_Paris_01 = BeautifulSoup(url_Paris_01, "html.parser")
placeofinterset=soup_Paris_01.find("span",id="Places_of_interest").find_next('ul')
for place in placeofinterset.find_all('a'):
print(place['title']) #This will give you title
print(place.text) #This will give you texthttps://stackoverflow.com/questions/58792516
复制相似问题