文章/答案/技术大牛

发布

社区首页 >问答首页 >BeautifulSoup查找包含具有特定类的标记的HTML

问BeautifulSoup查找包含具有特定类的标记的HTML
EN

Stack Overflow用户

提问于 2022-10-18 20:38:50

回答 4查看 48关注 0票数 2

如何找到包含特定类标记的所有标记？数据如下：

<tr>
<td class="TDo1" width=17%>Tournament</td>
<td class="TDo2" width=8%>Date</td>
<td class="TDo2" width=6%>Pts.</td>
<td class="TDo2" width=34%>Pos. Player (team)</td>
<td class="TDo5" width=35%>Pos. Opponent (team)</td>
</tr>

<tr>
<td class=TDq1><a href="p.pl?t=410">GpWl(op)&nbsp;4.01/02</a></td>
<td class=TDq2><a href="p.pl?t=410&r=4">17.02.02</a></td>
<td class=TDq3>34/75</td>
<td class=TDq5>39. John Deep</td>
<td class=TDq9>68. <a href="p.pl?ply=1229">Mark Deep</a></td>
</tr>

<tr>
<td class=TDp1><a href="p.pl?t=410">GpWl(op)&nbsp;4.01/02</a></td>
<td class=TDp2><a href="p.pl?t=410&r=4">17.02.02</a></td>
<td class=TDp3>34/75</td>
<td class=TDp6>39. John Deep</td>
<td class=TDp8>7. <a href="p.pl?ply=10">Darius Star</a></td>
</tr>

我在努力

for mtable in bs.find_all('tr', text=re.compile(r'class=TD?3')):
print(mtable)

但这会返回零结果。

python

beautifulsoup

findall

回答 4

Stack Overflow用户

发布于 2022-10-18 20:57:15

我想您希望找到所有包含带有类<tr>的标记的TD<any character>3。

import re

# `html` contains your html from the question
soup = BeautifulSoup(html, "html.parser")
pat = re.compile(r"TD.3")

for tr in soup.find_all(
    lambda tag: tag.name == "tr"
    and tag.find(class_=lambda cl: cl and pat.match(cl))
):
    print(tr)

指纹：

<tr>
<td class="TDq1"><a href="p.pl?t=410">GpWl(op) 4.01/02</a></td>
<td class="TDq2"><a href="p.pl?t=410&amp;r=4">17.02.02</a></td>
<td class="TDq3">34/75</td>
<td class="TDq5">39. John Deep</td>
<td class="TDq9">68. <a href="p.pl?ply=1229">Mark Deep</a></td>
</tr>
<tr>
<td class="TDp1"><a href="p.pl?t=410">GpWl(op) 4.01/02</a></td>
<td class="TDp2"><a href="p.pl?t=410&amp;r=4">17.02.02</a></td>
<td class="TDp3">34/75</td>
<td class="TDp6">39. John Deep</td>
<td class="TDp8">7. <a href="p.pl?ply=10">Darius Star</a></td>
</tr>

票数 1

Stack Overflow用户

发布于 2022-10-18 20:52:06

您需要找到与td匹配的方法。像这样，

In [1]: bs.find_all('td', {"class": re.compile(r'TD\w\d')})
Out[1]: 
[<td class="TDo1" width="17%">Tournament</td>,
 <td class="TDo2" width="8%">Date</td>,
 <td class="TDo2" width="6%">Pts.</td>,
 <td class="TDo2" width="34%">Pos. Player (team)</td>,
 <td class="TDo5" width="35%">Pos. Opponent (team)</td>,
 <td class="TDq1"><a href="p.pl?t=410">GpWl(op) 4.01/02</a></td>,
 <td class="TDq2"><a href="p.pl?t=410&amp;r=4">17.02.02</a></td>,
 <td class="TDq3">34/75</td>,
 <td class="TDq5">39. John Deep</td>,
 <td class="TDq9">68. <a href="p.pl?ply=1229">Mark Deep</a></td>,
 <td class="TDp1"><a href="p.pl?t=410">GpWl(op) 4.01/02</a></td>,
 <td class="TDp2"><a href="p.pl?t=410&amp;r=4">17.02.02</a></td>,
 <td class="TDp3">34/75</td>,
 <td class="TDp6">39. John Deep</td>,
 <td class="TDp8">7. <a href="p.pl?ply=10">Darius Star</a></td>]

票数 0

Stack Overflow用户

发布于 2022-10-18 21:23:56

这可能对你有帮助：

from bs4 import BeautifulSoup
import re

t = 'your page source' 
pat = re.compile(r'class=TD.3')
classes = re.findall(pat,t)
classes = [j[6:] for j in classes]
soup = BeautifulSoup(t)
result = list()
for i in classes:
    item = soup.find_all(attrs={"class": i})
    result.extend(item)
for i in result:
    print(i.parent)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/74117210

复制

相似问题

问BeautifulSoup查找包含具有特定类的标记的HTML
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问BeautifulSoup查找包含具有特定类的标记的HTMLEN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问BeautifulSoup查找包含具有特定类的标记的HTML
EN