我知道如何从whatsapp上删除表情符号,但只有在1)有一个没有任何文本的表情符号或2)有带有表情符号的文本的情况下。但是,当一条消息中有两个没有任何文本的表情符号时,我无法进行网页抓取。.This是消息"?“的html。
<div class="JwMbj i0jNr selectable-text copyable-text">
<span class="_3R6rC">
<img crossorigin="anonymous"
src="/img/d07f9aca6938f691b840f97dd1cd67dd_w_638-64.png" alt="?" draggable="false"
class="_2UdhN _1xeoG i0jNr selectable-text copyable-text" data-plain-text="?"
style="visibility: visible;">
</span>
</div>我试着用这个代码来获取表情符号
m = s.find_all('div', attrs={'class':'i0jNr'})
v = m.find('span', attrs={'class':'_3R6rC'})
for i in v.children:
if isinstance(i, NavigableString):
print(i)
elif isinstance(i, Tag):
print(i.attrs['alt'])但通过此代码,只有当有单个表情符号时才有效,但当消息中有两个表情符号时,它只打印一个,比如如果消息是"??“,则输出为"?”(它只打印第一个表情符号)。这是该消息的html
<div class="JwMbj i0jNr selectable-text copyable-text">
<span class="_3R6rC">
<img crossorigin="anonymous"
src="/img/d07f9aca6938f691b840f97dd1cd67dd_w_1749-40.png" alt="?" draggable="false"
class="_2UdhN _3zyju i0jNr selectable-text copyable-text" data-plain-text="?"
style="visibility: visible;">
</span>
<span class="_3R6rC">
<img crossorigin="anonymous"
src="/img/d07f9aca6938f691b840f97dd1cd67dd_w_1845-40.png" alt="?" draggable="false"
class="_2UdhN _3zyju i0jNr selectable-text copyable-text" data-plain-text="?"
style="visibility: visible;">
</span>
</div>我尝试了这个代码来打印这两个表情符号,但它不起作用
msglist = []
m = s.find_all('div', attrs={'class':'i0jNr'})
for b in m:
v = b.find_all('div', attrs={'class':'JwMbj'})
for x in v:
z = x.find_all('span', attrs={'class':'_3R6rC'})
for i in z.children:
if isinstance(i, NavigableString):
print(i)
elif isinstance(i, Tag):
print(i.attrs['alt'])但是它没有给出任何输出.Someone帮助我
发布于 2021-08-23 17:21:00
您可以使用.get_text将<img>标记转换为纯文本,然后正常获取文本。例如:
from bs4 import BeautifulSoup
html_doc = """
<div class="JwMbj i0jNr selectable-text copyable-text">
<span class="_3R6rC">
<img crossorigin="anonymous"
src="/img/d07f9aca6938f691b840f97dd1cd67dd_w_1749-40.png" alt="?" draggable="false"
class="_2UdhN _3zyju i0jNr selectable-text copyable-text" data-plain-text="?"
style="visibility: visible;">
</span>
<span class="_3R6rC">
<img crossorigin="anonymous"
src="/img/d07f9aca6938f691b840f97dd1cd67dd_w_1845-40.png" alt="?" draggable="false"
class="_2UdhN _3zyju i0jNr selectable-text copyable-text" data-plain-text="?"
style="visibility: visible;">
</span>
</div>
"""
soup = BeautifulSoup(html_doc, "html.parser")
# select the main text div
text_div = soup.select_one(".copyable-text")
# convert all <img> to plain-text:
for img in text_div.select("img[data-plain-text]"):
img.replace_with(img["data-plain-text"])
# get text normally:
print(text_div.get_text(strip=True))打印:
??https://stackoverflow.com/questions/68896558
复制相似问题