下面是一段HTML代码(出自delicious):
<h4>
<a rel="nofollow" class="taggedlink " href="http://imfy.us/" >Generate Secure Links with Anonymous Referers & Anti-Bot Protection</a>
<span class="saverem">
<em class="bookmark-actions">
<strong><a class="inlinesave action" href="/save?url=http%3A%2F%2Fimfy.us%2F&title=Generate%20Secure%20Links%20with%20Anonymous%20Referers%20%26%20Anti-Bot%20Protection&jump=%2Fdux&key=fFS4QzJW2lBf4gAtcrbuekRQfTY-&original_user=dux&copyuser=dux&copytags=web+apps+url+security+generator+shortener+anonymous+links">SAVE</a></strong>
</em>
</span>
</h4>我正在尝试找到class="inlinesave action“的所有链接。代码如下:
sock = urllib2.urlopen('http://delicious.com/theuser')
html = sock.read()
soup = BeautifulSoup(html)
tags = soup.findAll('a', attrs={'class':'inlinesave action'})
print len(tags)但是它什么也找不到!
有什么想法吗?
谢谢
发布于 2009-11-25 21:09:59
如果你想找一个恰好包含这两个类的锚点,我认为必须使用regexp:
tags = soup.findAll('a', attrs={'class': re.compile(r'\binlinesave\b.*\baction\b')})请记住,如果类名的顺序颠倒(class="action inlinesave"),则此正则表达式将不起作用。
下面的语句应该适用于所有情况(即使它看起来很难看。):
soup.findAll('a',
attrs={'class':
re.compile(r'\baction\b.*\binlinesave\b|\binlinesave\b.*\baction\b')
})发布于 2009-11-25 21:46:11
Python字符串方法
html=open("file").read()
for item in html.split("<strong>"):
if "class" in item and "inlinesave action" in item:
url_with_junk = item.split('href="')[1]
m = url_with_junk.index('">')
print url_with_junk[:m]发布于 2009-11-25 22:05:43
也许这个问题在verion 3.1.0中已经解决了,我可以解析你的。
>>> html="""<h4>
... <a rel="nofollow" class="taggedlink " href="http://imfy.us/" >Generate Secure Links with Anony
... <span class="saverem">
... <em class="bookmark-actions">
... <strong><a class="inlinesave action" href="/save?url=http%3A%2F%2Fimfy.us%2F&title=Gen
... </em>
... </span>
... </h4>"""
>>>
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup(html)
>>> tags = soup.findAll('a', attrs={'class':'inlinesave action'})
>>> print len(tags)
1
>>> tags
[<a class="inlinesave action" href="/save?url=http%3A%2F%2Fimfy.us%2F&title=Generate%20Secure%
>>>我也尝试过BeautifulSoup 2.1.1,它根本不能工作。
https://stackoverflow.com/questions/1796725
复制相似问题