我正在尝试从我的汤中获取“数据-val”,但是它们都在一个巨大的列表中出现,并且没有像网站上显示的那样在不同的列表/列中格式化。
我知道标题在这里:
<th class="num record drop-3" data-tsorter="data-val">
<span class="long-points">
proj. pts.
</span>
<span class="short-points">
pts.
</span>
</th>
<th class="pct" data-tsorter="data-val">
<span class="full-relegated">
relegated
</span>
<span class="small-relegated">
rel.
</span>
</th>
<th class="pct" data-tsorter="data-val">
<span class="full-champ">
qualify for UCL
</span>
<span class="small-champ">
make UCL
</span>
</th>
<th class="pct sorted" data-tsorter="data-val">
<span class="drop-1">
win Premier League
</span>
<span class="small-league">
win league
</span>
</th>这就是我想要的:
url = 'https://projects.fivethirtyeight.com/soccer-predictions/premier-league/'
r = requests.get(url = url)
soup = BeautifulSoup(r.text, "html.parser")
table = soup.find("table", {"class":"forecast-table"})
#print(table.prettify())
for i in table.find_all("td", {"class":"pct"}):
print(i)因此,理想情况下,我想要4个列表,其中包含类名,然后是匹配的值
发布于 2018-10-08 16:58:10
不完全确定您想要什么特定的cols,但这将获得标记属性中所有带有data-val的科尔:
import requests
from bs4 import BeautifulSoup
url = 'https://projects.fivethirtyeight.com/soccer-predictions/premier-league/'
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
table = soup.find("table", {"class": "forecast-table"})
team_rows = table.find_all("tr", {"class": "team-row"})
for team in team_rows:
print("Team name: {}".format(team['data-str']))
team_data = team.find_all("td")
for data in team_data:
if hasattr(data, 'attrs') and 'data-val' in data.attrs:
print("\t{}".format(data.attrs['data-val']))
print("\n")如果我确实正确地理解了您的问题,您将查找最后几个值,这些值在html源代码中是相当无标记的。在这种情况下,您可以尝试简单地查找tag[6],尽管它当然不是很健壮--但这是html解析,因此“不太健壮”对于imho来说是很常见的。
我在这里要做的是找到所有的团队行(这很容易,因为类名),然后简单地循环遍历团队行中的所有td标记。
https://stackoverflow.com/questions/52706259
复制相似问题