首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >为什么价值"External_links“而不是从网站上刮来的东西?

为什么价值"External_links“而不是从网站上刮来的东西?
EN

Stack Overflow用户
提问于 2018-07-22 03:06:22
回答 3查看 44关注 0票数 0

我的代码如下所示,但是为什么brand值输出External_links而不是我所提取的项目列表。

代码语言:javascript
复制
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq


my_url = 'https://en.wikipedia.org/wiki/Harry_Potter'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html,"html.parser")
headline = page_soup.findAll("span",{"class":"mw-headline"})

for item in headline:
    brand = item["id"] # Outputs "External_links"
EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2018-07-22 03:12:40

for循环中,您将遍历页面中的每个标题,然后将标题值分配给变量brand。循环完成后,brand的值将是最后一个标题("External_links")。

如果您修改代码以打印每个标题的值,您将看到您正在获取所要的值。

代码语言:javascript
复制
>>> for item in headline:
...    print(item["id"])
...
Plot
Early_years
Voldemort_returns
Supplementary_works
Harry_Potter_and_the_Cursed_Child
In-universe_books
Pottermore_website
Structure_and_genre
Themes
Origins
Publishing_history
Translations
Completion_of_the_series
Cover_art
Achievements
Cultural_impact
Commercial_success
Awards,_honours,_and_recognition
Reception
Literary_criticism
Social_impact
Controversies
Adaptations
Films
Spin-off_prequels
Games
Audiobooks
Stage_production
Attractions
The_Wizarding_World_of_Harry_Potter
The_Making_of_Harry_Potter
References
Further_reading
External_links
票数 1
EN

Stack Overflow用户

发布于 2018-07-22 11:53:49

您的brand变量需要是一个列表,例如,代码可以如下所示:

代码语言:javascript
复制
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
from pprint import pprint

my_url = 'https://en.wikipedia.org/wiki/Harry_Potter'
with uReq(my_url) as uClient:
    page_html = uClient.read()
    page_soup = soup(page_html, "xml")

brand = []
for item in page_soup.find_all('span', {'class': 'mw-headline'}):
    brand.append(item["id"])

pprint(brand)

指纹:

代码语言:javascript
复制
['Plot',
 'Early_years',
 'Voldemort_returns',
 'Supplementary_works',
 'Harry_Potter_and_the_Cursed_Child',
 'In-universe_books',
 'Pottermore_website',
 'Structure_and_genre',
 'Themes',
 'Origins',
 'Publishing_history',
 'Translations',
 'Completion_of_the_series',
 'Cover_art',
 'Achievements',
 'Cultural_impact',
 'Commercial_success',
 'Awards,_honours,_and_recognition',
 'Reception',
 'Literary_criticism',
 'Social_impact',
 'Controversies',
 'Adaptations',
 'Films',
 'Spin-off_prequels',
 'Games',
 'Audiobooks',
 'Stage_production',
 'Attractions',
 'The_Wizarding_World_of_Harry_Potter',
 'The_Making_of_Harry_Potter',
 'References',
 'Further_reading',
 'External_links']
票数 0
EN

Stack Overflow用户

发布于 2018-07-22 14:01:49

实现同样的使用列表理解:

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup
from pprint import pprint

url = 'https://en.wikipedia.org/wiki/Harry_Potter'

soup = BeautifulSoup(requests.get(url).text, "lxml")
items = [item.get('id') for item in soup.find_all('span',class_='mw-headline')]
pprint(items)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/51461701

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档