首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python bs4 bs4抓取仅返回空值

Python bs4 bs4抓取仅返回空值
EN

Stack Overflow用户
提问于 2020-08-01 01:35:42
回答 2查看 39关注 0票数 0

我正在尝试抓取this网站,该网站包含即将到来的选举候选人的信息。

我正在尝试获取候选人声明和个人资料图片,它们都包含在"votewa- candidate -page“标记中,但每当我尝试抓取数据时,我只能得到空值。

下面是我的一小段代码:

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup

url = 'https://voter.votewa.gov/GenericVoterGuide.aspx?e=865&c=17#/candidates/57369/45923'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')  

statement = soup.find('votewa-candidate-page')

我很感谢你们的帮助,谢谢你们。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-08-01 01:43:43

当您分析网站时,它会通过ajax调用加载数据。

以下脚本将由您打印所需的信息

代码语言:javascript
复制
import requests

res = requests.get("https://voter.votewa.gov/elections/candidate.ashx?e=865&r=57369&b=45923&la=&c=17")

data = res.json()

photo = data[0]['statement']['Photo']
statement = data[0]['statement']['Statement']

print(statement)

我只是打印声明,因为照片是base64编码的图像。

输出:

代码语言:javascript
复制
"<p><strong>Elected Experience\n</strong><br />\nUnited States Representative, 2012-Present. Ways and Means Committee and Select Committee to Modernize Congress.\n</p>\n<p><strong>Other Professional Experience</strong><br />\nSuccessful career as a businesswoman and entrepreneur. Former Microsoft executive, led local high-tech startups. Former Director of Washington&rsquo;s Department of Revenue, where I led efforts to simplify the tax system and help small businesses.\n</p>\n<p><strong>Education\n</strong><br />\nB.A., Biology, Reed College; M.B.A., University of Washington.\n</p>\n<p><strong>Community Service\n</strong><br />\nI&rsquo;ve mentored students at UW Business School; been active in my church, serving as a board member. Volunteered with the PTA, Girl Scouts and YWCA, supporting transitional housing, job training and services to help families get back on their feet.\n</p>\n<p><strong>Statement\n</strong><br />\nDuring this pandemic, families across the 1st Congressional District are struggling and concerned about the future. Now more than ever, we need strong leadership that&rsquo;s focused on helping those in need, protecting health and safety and restoring our economy. As your Congresswoman, I am determined that we come through this difficult stretch stronger than ever. My focus is on putting partisanship aside and delivering results.\n</p>\n<p>The first known COVID-19 case struck in Washington State before anywhere else. President Trump denied the threat and wasted precious time. By contrast, I moved quickly, securing funds to backfill state and local public health accounts. Washington State immediately received over $11 million, with continued ongoing support. I also advocated for employee retention tax credits to keep an estimated 60 million people employed with benefits. They were incorporated into the pandemic relief Heroes Act.\n</p>\n<p>As the economy reopens we face an uncertain recovery. We need smart, decisive action to restore our economy. My background as a successful businesswoman and entrepreneur means I understand how to bring businesses back and create jobs.\n</p>\n<p>My proposal to expand child tax credits, which could reduce child poverty 38 percent, has been incorporated into pandemic relief legislation. I also developed provisions, adopted last year, to make the process of applying for financial aid easier for students. I&rsquo;ve pushed to expand farmers&rsquo; access to markets, improve access to broadband, and to increase the supply of affordable housing.\n</p>\n<p>My core values remain unchanged. As I have done from the day I took office, I'll protect Social Security, Medicare and a woman&rsquo;s right to choose. I have endorsements from Democratic groups, labor, local leaders and many others.\n</p>\n<p>The fallout from this pandemic is challenging, but I'm committed to putting people back to work and preserving the middle-class. I ask for your support.</p>"
票数 0
EN

Stack Overflow用户

发布于 2020-08-01 02:18:01

我猜您不想先搜索json,然后再打印,所以这里有一段代码,它获取示例中的url,获取json auto并打印语句。

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup
url = 'https://voter.votewa.gov/GenericVoterGuide.aspx?e=865&c=17#/candidates/57369/45923'

converted = f'https://voter.votewa.gov/elections/candidate.ashx?e=' \
            f'{url.split("?e=")[1].split("&")[0]}&r={url.split("/")[-2]}&b={url.split("/")[-1]}&la=&c={url.split("&c=")[1].split("#")[0]}'

page = requests.get(converted)

data = page.json()

statement = data[0]['statement']['Statement']

soup = BeautifulSoup(statement, 'html.parser')

print(*[p.text for p in soup.select('p')])

打印:

代码语言:javascript
复制
Elected Experience

United States Representative, 2012-Present. Ways and Means Committee and Select Committee to Modernize Congress.
 Other Professional Experience
Successful career as a businesswoman and entrepreneur. Former Microsoft executive, led local high-tech startups. Former Director of Washington’s Department of Revenue, where I led efforts to simplify the tax system and help small businesses.
 Education

B.A., Biology, Reed College; M.B.A., University of Washington.
 Community Service

以此类推。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/63196553

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档