import urllib2
import re

html_content = urllib2.urlopen('http://www.domain.com').read()

matches = re.findall('regex of string to find', html_content);

if len(matches) == 0: 
   print 'I did not find anything'
else:
   print 'My string is in the html'

票数 4

Stack Overflow用户

发布于 2011-02-08 04:16:38

lxml非常棒：http://lxml.de/parsing.html

我经常在xpath中使用它来从html中提取数据。

另一个选择是http://www.crummy.com/software/BeautifulSoup/，它也很棒。

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/4925966

复制

相似问题

问在网页中搜索
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在网页中搜索EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在网页中搜索
EN