因此,我试图制作一个程序,以获得Spotify个人资料图片。我可以得到图片的URL,但问题是每个URL中有两个。
import requests
from bs4 import BeautifulSoup
list = ["https://open.spotify.com/user/0n7zzdkxmt0ldpo1kqugwca67",
"https://open.spotify.com/user/1l23d3k5yq2v9ey191zp8uqxr",
]
for i in list:
response = requests.get(i)
html_content = response.content
soup = BeautifulSoup(html_content, "html.parser")
for i in soup.find_all("div",{"class":"bg lazy-image"}):
print(i.get("data-src"))这就是结果:
https://i.scdn.co/image/ab6775700000ee85202880a205b627a7e6f25659
https://i.scdn.co/image/ab6775700000ee85202880a205b627a7e6f25659
https://i.scdn.co/image/ab6775700000ee85da40dde3363ed185d5e48a0a
https://i.scdn.co/image/ab6775700000ee85da40dde3363ed185d5e48a0a
Process finished with exit code 0我的问题是,如果它们是相同的,我怎么能只打印其中的一个呢?
发布于 2021-09-10 09:32:14
在这种情况下,您只需将可迭代转换为集。
for i in set(soup.find_all("div",{"class":"bg lazy-image"})):
print(i.get("data-src"))通过这样做,可迭代中的所有重复项都会被消除。
我强烈建议阅读Python的数据结构
发布于 2021-09-10 09:32:17
我会将它们转换成一组,以删除副本:
divs = soup.find_all("div",{"class":"bg lazy-image"})
urls = set(d.get('data-src') for d in divs) 发布于 2021-09-10 09:35:03
一个简单的解决方案就是检查URL是否等于最后一个URL。
import requests
from bs4 import BeautifulSoup
list = ["https://open.spotify.com/user/0n7zzdkxmt0ldpo1kqugwca67",
"https://open.spotify.com/user/1l23d3k5yq2v9ey191zp8uqxr",
]
for i in list:
response = requests.get(i)
html_content = response.content
url = None
soup = BeautifulSoup(html_content, "html.parser")
for i in soup.find_all("div",{"class":"bg lazy-image"}):
if i.get("data-src") != url:
url = i.get("data-src")
print(url)https://stackoverflow.com/questions/69130081
复制相似问题