首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python +机械化不能与Delicious一起使用

Python +机械化不能与Delicious一起使用
EN

Stack Overflow用户
提问于 2010-12-18 10:57:09
回答 1查看 4.1K关注 0票数 4

我正在使用机械化和美丽的汤来刮掉一些美味的数据

代码语言:javascript
复制
from mechanize import Browser
from BeautifulSoup import BeautifulSoup

mech = Browser()
url = "http://www.delicious.com/varunsrin"
page = mech.open(url)
html = page.read()

soup = BeautifulSoup(html)
print soup.prettify()

这在我使用它的大多数站点上都有效,但在Delicious上失败了,输出如下

代码语言:javascript
复制
Traceback (most recent call last):  
File "C:\Users\Varun\Desktop\Python-3.py",
line 7, in <module>
    page = mech.open(url)
File "C:\Python26\lib\site-packages\mechanize\_mechanize.py",
line 203, in open
    return self._mech_open(url, data, timeout=timeout)   File
"C:\Python26\lib\site-packages\mechanize\_mechanize.py",
line 255, in _mech_open
    raise response httperror_seek_wrapper: HTTP Error
403: request disallowed by robots.txt
C:\Program Files (x86)\ActiveState Komodo IDE 6\lib\support\dbgp\pythonlib\dbgp\client.py:1360:
DeprecationWarning:
BaseException.message has been deprecated as of Python 2.6
    child = getattr(self.value, childStr)
C:\Program Files (x86)\ActiveState Komodo IDE 6\lib\support\dbgp\pythonlib\dbgp\client.py:456:
DeprecationWarning:
BaseException.message has been deprecated as of Python 2.6
    return apply(func, args)
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2010-12-18 12:52:38

学习here提供的使用python+mechanize模拟浏览器的一些技巧。添加addheadersset_handle_robots似乎是最低要求。使用下面的代码,我得到了输出:

代码语言:javascript
复制
from mechanize import Browser, _http
from BeautifulSoup import BeautifulSoup

br = Browser()    
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

url = "http://www.delicious.com/varunsrin"
page = br.open(url)
html = page.read()

soup = BeautifulSoup(html)
print soup.prettify()
票数 9
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/4476354

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档