文章/答案/技术大牛

发布

社区首页 >问答首页 >我怎样才能在这个html中的特定内容中爬行漂亮汤呢？

问我怎样才能在这个html中的特定内容中爬行漂亮汤呢？
EN

Stack Overflow用户

提问于 2018-05-21 11:46:51

回答 1查看 31关注 0票数 1

我有这样一个html：

<tr>
<td>
<b>
<a href=".././statistics/power" title="Exponent of the power-law degree distribution">Power law exponent (estimated) with d<sub>min</sub></a>
</b>
</td>
<td>2.1310 (d<sub>min</sub> = 49) 
</td>
</tr>

此外，我还有许多其他的html与这个几乎相同，但在第三行中有不同的数字与底部的。我想在这个html中爬行像2.1310这样的数字，但是不知道该怎么做。

这是我的代码：

def getLinks(Url):
    html=urlopen(Url)
    s = '<tr><td><b><a href=".././statistics/power" title' \
    '="Exponent of the power-law degree distibution">Power law exponent (estimated) with ' \
    'd<sub>min</sub></a></b></td><td>2.1310(d<sub>min</sub> = 49) </td></tr>'
    soup = BeautifulSoup(s, 'html.parser')
    print(soup.find_all('td')[1].contents[0][:-2])

我可以用这个代码得到2.1310。

但是，当的数字被更改时，，我不知道如何定义一个统一的‘当面对其他html。有这么多相似的html，我无法复制每个人时，编码。

python

beautifulsoup

web-crawler

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-05-21 11:55:33

可以使用Regex提取浮点值。

Ex:

from bs4 import BeautifulSoup
import re
s = '<tr><td><b><a href=".././statistics/power" title' \
    '="Exponent of the power-law degree distibution">Power law exponent (estimated) with ' \
    'd<sub>min</sub></a></b></td><td>2.1610(d<sub>min</sub> = 2) </td></tr>'
soup = BeautifulSoup(s, 'html.parser')
for tr in soup.find_all('tr'):
    m = re.search("\d+\.\d+", tr.text)
    if m:
        print(m.group())

输出：

2.1610

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50448198

复制

相似问题

问我怎样才能在这个html中的特定内容中爬行漂亮汤呢？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问我怎样才能在这个html中的特定内容中爬行漂亮汤呢？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问我怎样才能在这个html中的特定内容中爬行漂亮汤呢？
EN