我有一个Python脚本,它可以从网站(www.nowgoal.com)中抓取数据。由于这个网页包含Javascript代码,所以我使用PyQt4呈现页面,然后将其转换为Html,最后解析所需的数据。一切正常,但最近他们添加了一条Javascript警告消息,防止页面被正确呈现。通过查看源页面,在底部有警告消息的Javascript函数:
`<script type ="text/javascript" >
if(getCookie("enurl_bak")==null)
{
writeCookie("enurl_bak", "1");
if(confirm('Nowgoal.net is our spare link\n\n Please add to your favorites')) {try{window.external.addFavorite('http://www.nowgoal.net','LiveScore - NowGoal.com');}catch(e) {alert('Sorry, fail to add favorits. Your browser can\'t finish this operation. Please use Ctrl+D to add.');}}
}
</script>`此时,设置cookie(name=enurl_bak“;value<>null)来跳过警报似乎就足够了。问题是我不知道该如何做,我到处都找过了,但没有找到一个关于如何使用PyQt4设置cookie的实际例子。
下面是我用来呈现网页的内容:from PyQt4.QtGui import * from PyQt4.QtCore import * from PyQt4.QtWebKit import * from PyQt4 import QtNetwork class Render(QWebPage): def __init__(self, url): self.app = QApplication(sys.argv) QWebPage.__init__(self) self.loadFinished.connect(self._loadFinished) self.mainFrame().page().setNetworkAccessManager(networkAccessManager) self.mainFrame().load(QUrl(url)) self.app.exec_() def _loadFinished(self, result): self.frame = self.mainFrame() self.app.quit() url = 'http://www.nowgoal.com' r = Render(url) html = r.frame.toHtml()
我还尝试了setHtml (来自urllib2)而不是load(QUrl) PyQt4方法,方法是删除Javascript警告函数,但没有成功。
发布于 2014-04-11 16:26:46
是的!!完成:)
from PyQt4.QtNetwork import QNetworkCookie, QNetworkCookieJar
from PyQt4.QtGui import *
from PyQt4.QtCore import *
from PyQt4.QtWebKit import *
import sys
class Render(QWebPage):
def __init__(self, url):
self.app = QApplication(sys.argv)
QWebPage.__init__(self)
self.loadFinished.connect(self._loadFinished)
self.mainFrame().load(QUrl(url))
self.cookie = QNetworkCookie()
self.cookie.setDomain('.nowgoal.com')
self.cookie.setName('enurl_bak')
self.cookiejar = QNetworkCookieJar()
self.cookiejar.setAllCookies([self.cookie])
self.networkAccessManager().setCookieJar(self.cookiejar)
self.app.exec_()
def _loadFinished(self, result):
self.frame = self.mainFrame()
print"loadfinished"
self.app.quit()
url = 'http://www.nowgoal.com'
Render(url) 再次感谢你让我走上正确的道路!
发布于 2014-04-09 19:22:00
下面的测试脚本成功地设置和读取cookie,从而防止显示警报消息。然而,这只适用于test.html页面:由于一些未知的原因(webkit bug?),它不能在www.nowgoal.com网站上工作。
from PyQt4 import QtCore, QtGui, QtWebKit, QtNetwork
class WebPage(QtWebKit.QWebPage):
def __init__(self):
QtWebKit.QWebPage.__init__(self)
self.cookies = QtNetwork.QNetworkCookieJar(self)
self.cookies.setAllCookies(
[QtNetwork.QNetworkCookie('enurl_bak', '1')])
self.networkAccessManager().setCookieJar(self.cookies)
self.mainFrame().loadFinished.connect(self.handleLoadFinished)
def start(self, url):
self.mainFrame().load(QtCore.QUrl(url))
def handleLoadFinished(self):
print('handleLoadFinished')
QtGui.qApp.quit()
if __name__ == '__main__':
import sys
app = QtGui.QApplication(sys.argv)
window = WebPage()
window.start('test.html')
sys.exit(app.exec_())test.html
<script type="text/javascript">
// from www.nowgoal.com (public.js)
function getCookie(name){
var cname = name + "=";
var dc = document.cookie;
if (dc.length > 0){
begin = dc.indexOf(cname);
if (begin != -1){
begin += cname.length;
end = dc.indexOf(";", begin);
if (end == -1) end = dc.length;
return dc.substring(begin, end);
}
}
return null;
}
if (getCookie('enurl_bak') == null) {
alert('"enurl_bak" value is null');
}
</script>更新
似乎没有webkit错误:我只需要按照answer by SkY3d设置域。
https://stackoverflow.com/questions/22964637
复制相似问题