嘿,所以我用美丽的汤做一个剪贴器,目的是提取在playstore上搜索的应用程序的id。守则:
def linkgen(name):
base = "https://play.google.com/store/search?q="
req = requests.get(base + name)
soup = BeautifulSoup(req.content, "html.parser")
soup2=soup.find( class_ = "Si6A0c Gy4nib" )
print(soup2)所产生的产出:
<a class="Si6A0c Gy4nib" href="/store/apps/details?id=com.facebook.katana" jslog="38003; 1:575|CBSqARUKEwjwyfy+1fj6AhXGZI4KHfF0AA8=; track:click,impression"><div class="Shbxxd"><img alt="Screenshot image" aria-hidden="true" class="T75of jpDEN" loading="lazy" src="https://play-lh.googleusercontent.com/9s-9zONYk4NZvLlHVMIF5cGCzrx7PjZYQ3uow5P8Rj2Mt_XHWygV3gOt75_iI1YtTg=w416-h235" srcset="https://play-lh.googleusercontent.com/9s-9zONYk4NZvLlHVMIF5cGCzrx7PjZYQ3uow5P8Rj2Mt_XHWygV3gOt75_iI1YtTg=w832-h470 2x"/></div><div class="j2FCNc"><img alt="Thumbnail image" aria-hidden="true" class="T75of stzEZd" loading="lazy" src="https://play-lh.googleusercontent.com/ccWDU4A7fX1R24v-vvT480ySh26AYp97g1VrIB_FIdjRcuQB2JP2WdY7h_wVVAeSpg=s64" srcset="https://play-lh.googleusercontent.com/ccWDU4A7fX1R24v-vvT480ySh26AYp97g1VrIB_FIdjRcuQB2JP2WdY7h_wVVAeSpg=s128 2x"/><div class="cXFu1"><div class="ubGTjb"><span class="DdYX5">Facebook</span></div><div class="ubGTjb"><span class="wMUdtb">Meta Platforms, Inc.</span></div><div class="ubGTjb"><div aria-label="Rated 3.2 stars out of five stars" style="display: inline-flex; align-items: center;"><span class="w2kbF">3.2</span><span class="Q4fJQd"><i aria-hidden="true" class="google-material-icons Yvy3Fd">star</i></span></div></div></div></div></a>在此输出中,我希望提取href链接中存在的id (在本例中,我希望提取"com.facebook.katana")。我尝试在标记中搜索href,也尝试使用regex,但没有得到任何输出。有没有人?
谢谢
发布于 2022-10-24 11:30:51
要只获取href标记内容,可以尝试在您的python代码中使用以下regex示例:
r"(?<=id=)(.*?)(\")"然后删除字符串末尾的最后一个字符。如果您想尝试regex,只需转到这里 :)
希望这能帮到你!祝您今天愉快。
https://stackoverflow.com/questions/74180093
复制相似问题