文章/答案/技术大牛

发布

社区首页 >问答首页 >为什么价值"External_links“而不是从网站上刮来的东西？

问为什么价值"External_links“而不是从网站上刮来的东西？
EN

Stack Overflow用户

提问于 2018-07-22 03:06:22

回答 3查看 44关注 0票数 0

我的代码如下所示，但是为什么brand值输出External_links而不是我所提取的项目列表。

from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq


my_url = 'https://en.wikipedia.org/wiki/Harry_Potter'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html,"html.parser")
headline = page_soup.findAll("span",{"class":"mw-headline"})

for item in headline:
    brand = item["id"] # Outputs "External_links"

python

web-scraping

beautifulsoup

urllib

回答 3

Stack Overflow用户

回答已采纳

发布于 2018-07-22 03:12:40

在for循环中，您将遍历页面中的每个标题，然后将标题值分配给变量brand。循环完成后，brand的值将是最后一个标题("External_links")。

如果您修改代码以打印每个标题的值，您将看到您正在获取所要的值。

>>> for item in headline:
...    print(item["id"])
...
Plot
Early_years
Voldemort_returns
Supplementary_works
Harry_Potter_and_the_Cursed_Child
In-universe_books
Pottermore_website
Structure_and_genre
Themes
Origins
Publishing_history
Translations
Completion_of_the_series
Cover_art
Achievements
Cultural_impact
Commercial_success
Awards,_honours,_and_recognition
Reception
Literary_criticism
Social_impact
Controversies
Adaptations
Films
Spin-off_prequels
Games
Audiobooks
Stage_production
Attractions
The_Wizarding_World_of_Harry_Potter
The_Making_of_Harry_Potter
References
Further_reading
External_links

票数 1

Stack Overflow用户

发布于 2018-07-22 11:53:49

您的brand变量需要是一个列表，例如，代码可以如下所示：

from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
from pprint import pprint

my_url = 'https://en.wikipedia.org/wiki/Harry_Potter'
with uReq(my_url) as uClient:
    page_html = uClient.read()
    page_soup = soup(page_html, "xml")

brand = []
for item in page_soup.find_all('span', {'class': 'mw-headline'}):
    brand.append(item["id"])

pprint(brand)

指纹：

['Plot',
 'Early_years',
 'Voldemort_returns',
 'Supplementary_works',
 'Harry_Potter_and_the_Cursed_Child',
 'In-universe_books',
 'Pottermore_website',
 'Structure_and_genre',
 'Themes',
 'Origins',
 'Publishing_history',
 'Translations',
 'Completion_of_the_series',
 'Cover_art',
 'Achievements',
 'Cultural_impact',
 'Commercial_success',
 'Awards,_honours,_and_recognition',
 'Reception',
 'Literary_criticism',
 'Social_impact',
 'Controversies',
 'Adaptations',
 'Films',
 'Spin-off_prequels',
 'Games',
 'Audiobooks',
 'Stage_production',
 'Attractions',
 'The_Wizarding_World_of_Harry_Potter',
 'The_Making_of_Harry_Potter',
 'References',
 'Further_reading',
 'External_links']

票数 0

Stack Overflow用户

发布于 2018-07-22 14:01:49

实现同样的使用列表理解：

import requests
from bs4 import BeautifulSoup
from pprint import pprint

url = 'https://en.wikipedia.org/wiki/Harry_Potter'

soup = BeautifulSoup(requests.get(url).text, "lxml")
items = [item.get('id') for item in soup.find_all('span',class_='mw-headline')]
pprint(items)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/51461701

复制

相似问题

问为什么价值"External_links“而不是从网站上刮来的东西？
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问为什么价值"External_links“而不是从网站上刮来的东西？EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问为什么价值"External_links“而不是从网站上刮来的东西？
EN