我正在尝试找到一种方法来收集嵌入BeautifulSoup的youtube链接。下面是一段html的示例。
<span data-s9e-mediaembed="youtube"><span><span data-s9e-mediaembed-iframe='["allowfullscreen","","scrolling","no","style","background:url(https://i.ytimg.com/vi/-OQ2mQRB9E4/hqdefault.jpg) 50% 50% / cover","src","https://www.youtube.com/embed/-OQ2mQRB9E4"]' style="background:url(https://i.ytimg.com/vi/-OQ2mQRB9E4/hqdefault.jpg) 50% 50% / cover"></span></span></span> 如何隔离带有youtube链接的span标记,然后解析出youtube链接?
我尝试用youtube链接隔离跨度,如下所示,但它仍然打印所有跨度。
r = requests.get(url)
r_html = r.text
soup = BeautifulSoup(r_html, 'html.parser')
vids = soup.find_all("span")
videolist=[]
for i in range (0,len(vids)):
if vids[i].find("www.youtube.com") != -1:
videolist.append(vids[i])
for i in videolist:
print(i)发布于 2021-06-25 05:07:12
如果使用正则表达式查找所有包含YouTube的href会怎么样呢
import re
r = requests.get(url)
r_html = r.text
soup = BeautifulSoup(r_html, 'html.parser')
vids = soup.find_all('span', string=re.compile('youtube')
for a in vids:
print(a['href'])https://stackoverflow.com/questions/68122697
复制相似问题