首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python爬虫找不到元素

Python爬虫找不到元素
EN

Stack Overflow用户
提问于 2017-03-27 15:51:33
回答 1查看 170关注 0票数 1

我在用Python练习爬虫。

我的目标是在GRE网站上找到测试日期。

这就是我现在所做的。

代码语言:javascript
复制
import urllib2
from bs4 import BeautifulSoup
from urllib2 import urlopen, Request

gre_url = 'https://ereg.ets.org/ereg/public/testcenter/availability/seats?testId=30&testName=GRE+General+Test&location=Taipei+City%2C+Taiwan&latitude=25.0329636&longitude=121.56542680000007&testStartDate=April-01-2017&testEndDate=May-31-2017&currentTestCenterCount=0&sourceTestCenterCount=0&adminCode=&rescheduleFlow=false&isWorkflow=true&oldTestId=30&oldTestTime=&oldTestCenterId=&isUserLoggedIn=true&oldTestTitle=&oldTestCenter=&oldTestType=&oldTestDate=&oldTestTimeInfo=&peviewTestSummaryURL=%2Fresch%2Ftestpreview%2Fpreviewtestsummary&rescheduleURL='
data = urllib2.urlopen(gre_url).read()
soup = BeautifulSoup(data, "html.parser")
print soup.select('div.panel-heading.accordion-heading') # return []

但是,它似乎不能从div.panel-heading.accordion-heading中提取元素data。我该怎么解决呢?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-03-27 16:20:55

在进行最终的get请求之前,您需要分几个步骤访问后续URL,以检查可用性。下面是一些对我使用requests.Session()有用的东西

代码语言:javascript
复制
import json

import requests
from bs4 import BeautifulSoup


start_url = "https://www.ets.org/gre/revised_general/register/centers_dates/"
workflow_url = "https://ereg.ets.org/ereg/public/workflowmanager/schlWorkflow?_p=GRI"
seats_url = "https://ereg.ets.org/ereg/public/testcenter/availability/seats"
with requests.Session() as session:
    session.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'}

    session.get(start_url)
    session.get(workflow_url)
    response = session.get("https://ereg.ets.org/ereg/public/testcenter/availability/seats?testId=30&testName=GRE+General+Test&location=New+York%2C+NY%2C+United+States&latitude=40.7127837&longitude=-74.00594130000002&testStartDate=March-27-2017&testEndDate=April-30-2017&currentTestCenterCount=0&sourceTestCenterCount=0&adminCode=&rescheduleFlow=false&isWorkflow=true&oldTestId=30&oldTestTime=&oldTestCenterId=&isUserLoggedIn=true&oldTestTitle=&oldTestCenter=&oldTestType=&oldTestDate=&oldTestTimeInfo=&peviewTestSummaryURL=%2Fresch%2Ftestpreview%2Fpreviewtestsummary&rescheduleURL=")#

    soup = BeautifulSoup(response.content, "html.parser")
    result = json.loads(soup.select_one('#findSeatResponse')['value'])
    for date in result['sortedDates']:
        print(date['displayDate'])

当然,将最后一个URL更改为所需的URL。

票数 4
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/43051000

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档