首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何从网页抓取(javascript?)网站?

如何从网页抓取(javascript?)网站?
EN

Stack Overflow用户
提问于 2017-07-24 21:31:34
回答 1查看 1.2K关注 0票数 0

我试着从一个叫:flightradar24的网站上抓取数据

在我的代码中,我正在查找机场的名称,并且我想在web上抓取“到达”表。web抓取名称是可行的,因为这只是一种javascript格式,但是如果我尝试用我的代码web抓取这个表,我得不到任何值,我只得到对象名称(可能是因为有一个h1?)

有没有什么解决方案可以让我用网络抓取这部分页面?(Python 2.7)

我试过这个:

代码语言:javascript
复制
import urllib2, sys
from BeautifulSoup import BeautifulSoup

site= "https://www.flightradar24.com/data/airports/bud/arrivals"
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
name = soup.find('h1' , attrs={'class' : 'airport-name'})
print name

table = soup.find('div', { "class" : "row cnt-schedule-table" })
print table

当我想要打印表时,我得到了这样的结果:

代码语言:javascript
复制
/*
* 提示:该行代码过长,系统自动注释不进行高亮。一键复制会移除系统注释 
* <div class="row cnt-schedule-table"><label class="m-b-m">ARRIVALS</label><table class="table table-condensed table-hover data-table m-n-t-15"><thead><tr class="hidden-xs hidden-sm"><th class="w-80">TIME</th><th class="w-90">FLIGHT</th><th>FROM</th><th>AIRLINE</th><th class="w-120">AIRCRAFT</th><th class="w-10"></th><th class="w-160">STATUS</th></tr><tr ng-cloak="ng-cloak" data-ng-class="{hidden: btnLoadEarlier === false}" ng-show="(isFetching == false &amp;&amp; airportView.schedule.arrivals.data.length &gt; 0)"> 0)"&gt;<td colspan="7" class="text-center"><button data-mode="arrivals" data-page="-1" data-timestamp="{{currentUtcTimestampRender / 1000}}" ng-click="loadMoreFlights($event)" data-current-page="{{airportView.schedule.arrivals.page.current}}" data-loading-text='&lt;i class="fa fa-circle-o-notch fa-spin"&gt;&lt;/i&gt; Loading earlier flights...' class="btn btn-table-action btn-flights-load">Load earlier flights</button></td></tr></thead><tbody><tr ng-cloak="ng-cloak" class="loader" ng-show="(isFetching == true)"><td colspan="7" class="text-center"><i class="fa fa-spinner fa-pulse"></i> Loading...</td></tr><tr ng-cloak="ng-cloak" ng-show="(isFetching == false &amp;&amp; airportView.schedule.arrivals.data.length == 0)"><td colspan="7" class="text-center">Sorry, we don't have any information about flights for this airport</td></tr><tr ng-cloak="ng-cloak" class="hidden-md hidden-lg" ng-repeat="objFlight in airportView.schedule.arrivals.data track by $index" ng-show="(isFetching == false)"><td colspan="7" class="state-block-{{objFlight.flight.status.generic.status.color || 'gray'}}"><div class="row"><div class="col-xs-12 col-sm-12 p-xxs"><span ng-bind-html="objFlight.flight.statusMessage.text | unsafe"></span> {{objFlight.flight.status.generic.eventTime.utc * 1000 || '' | date: timeFormat: timeZone}}</div></div><div class="row"><div class="col-xs-3 col-sm-3 p-xxs"><i class="fa fa-clock-o"></i> <span>{{objFlight.flight.time.scheduled.arrival * 1000 || '-' | date: timeFormat : timeZone}}</span></div><div class="col-xs-3 col-sm-3 p-xxs"><i class="fa fa-tag"></i> <a class="notranslate" ng-href="/data/flights/{{objFlight.flight.identification.number.default | lowercase}}">{{objFlight.flight.identification.number.default}}</a></div><div class="col-xs-6 col-sm-6 p-xxs"><i class="fa fa-map-marker"></i> <span ng-bind-html="objFlight.flight.airport.origin.position.region.city || '-' | unsafe">{{objFlight.flight.airport.origin.position.region.city}} </span><a class="notranslate" ng-href="/data/airports/{{objFlight.flight.airport.origin.code.iata | lowercase}}" title="{{objFlight.flight.airport.origin.name}}, {{objFlight.flight.airport.origin.position.country.name}}">({{objFlight.flight.airport.origin.code.iata}})</a></div></div><div class="row"><div class="col-xs-3 col-sm-3 p-xxs" title="{{objFlight.flight.aircraft.model.text || ''}}"><i class="fa fa-plane"></i> {{objFlight.flight.aircraft.model.code || '-'}}</div><div class="col-xs-3 col-sm-3 p-xxs"><a ng-show="(objFlight.flight.aircraft.registration)" class="notranslate" ng-href="/data/aircraft/{{objFlight.flight.aircraft.registration | lowercase}}">{{objFlight.flight.aircraft.registration}}</a></div><div class="col-xs-6 col-sm-6 p-xxs">{{ objFlight.flight.airline.name || '-'}}</div></div></td></tr><tr ng-cloak="ng-cloak" class="hidden-xs hidden-sm" ng-repeat="objFlight in airportView.schedule.arrivals.data track by $index" ng-show="(isFetching == false)" data-date="{{(objFlight.flight.time.scheduled.arrival * 1000) | date: 'EEEE, MMM dd' : timeZone}}" tbl-render-directive="tbl-render-directive"><td>{{objFlight.flight.time.scheduled.arrival * 1000 || '-' | date: timeFormat : timeZone}}</td><td class="p-l-s cell-flight-number"><a class="chevron-toggle" ng-if="(objFlight.flight.identification.codeshare != null)" data-codeshare="{{objFlight.flight.identification.codeshare}}"></a> <a class="notranslate" ng-href="/data/flights/{{objFlight.flight.identification.number.default | lowercase}}">{{objFlight.flight.identification.number.default}}</a></td><td><div ng-show="(objFlight.flight.airport.origin)"><span class="hide-mobile-only">{{objFlight.flight.airport.origin.position.region.city}} </span><a class="fs-10 fbold notranslate" ng-href="/data/airports/{{objFlight.flight.airport.origin.code.iata | lowercase}}" title="{{objFlight.flight.airport.origin.name}}, {{objFlight.flight.airport.origin.position.country.name}}">({{objFlight.flight.airport.origin.code.iata}})</a></div><div ng-show="!(objFlight.flight.airport.origin)">-</div></td><td ng-bind-html=" objFlight.flight.airline.name || '-' | unsafe" title="{{ objFlight.flight.airline.name || ''}}" class="cell-airline"></td><td><span class="notranslate" ng-show="(objFlight.flight.aircraft.model.code)">{{objFlight.flight.aircraft.model.code}} </span><a ng-show="(objFlight.flight.aircraft.registration)" class="fs-10 fbold notranslate" ng-href="/data/aircraft/{{objFlight.flight.aircraft.registration | lowercase}}">({{objFlight.flight.aircraft.registration}}) </a><span ng-if="(!objFlight.flight.aircraft.model.code &amp;&amp; !objFlight.flight.aircraft.registration)">-</span></td><td><div class="state-block {{objFlight.flight.status.generic.status.color || 'gray'}}"></div></td><td><span ng-bind-html="objFlight.flight.statusMessage.text | unsafe"></span> {{objFlight.flight.status.generic.eventTime.utc * 1000 || '' | date: timeFormat: timeZone}}</td></tr></tbody><tfoot><tr ng-cloak="ng-cloak" data-ng-class="{hidden: btnLoadLater === false }" ng-show="(isFetching == false &amp;&amp; airportView.schedule.arrivals.data.length &gt; 0 &amp;&amp; airportView.schedule.arrivals.page.current &lt; airportView.schedule.arrivals.page.total)"> 0 &amp;&amp; airportView.schedule.arrivals.page.current &lt; airportView.schedule.arrivals.page.total)"&gt;<td colspan="7" class="text-center"><button data-mode="arrivals" data-page="2" data-timestamp="{{currentUtcTimestampRender / 1000 | int}}" ng-click="loadMoreFlights($event)" data-current-page="{{airportView.schedule.arrivals.page.current}}" data-loading-text='&lt;i class="fa fa-circle-o-notch fa-spin"&gt;&lt;/i&gt; Loading later flights...' class="btn btn-table-action btn-flights-load">Load later flights</button></td></tr><tr ng-cloak="ng-cloak" ng-show="(isFetching == false)"><td colspan="7">* All times are in {{(airportView.schedule.arrivals.data &amp;&amp; timeZone.toUpperCase() == 'UTC' ? 'UTC' : 'local')}} timezone</td></tr></tfoot></table></div>
*/

答案中的文章代码不起作用:

代码语言:javascript
复制
import urllib2
from bs4 import BeautifulSoup
import json

# new url      
url = 'https://www.flightradar24.com/data/airports/bud/arrivals'

# read all data
page = urllib2.urlopen(url).read()

# convert json text to python dictionary
data = json.loads(page)

print(data['row cnt-schedule-table'])

EN

回答 1

Stack Overflow用户

发布于 2017-07-24 21:57:25

Here是另一篇堆栈溢出文章,它有一个非常类似问题的解决方案。似乎您需要更改URL以匹配呈现的URL,而不是通常在浏览器中使用的URL。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/45282009

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档