首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何将每个句子用自定义标记颜色包装在标签内?

如何将每个句子用自定义标记颜色包装在标签内?
EN

Stack Overflow用户
提问于 2022-05-28 15:09:43
回答 2查看 142关注 0票数 3

我正在使用漂亮的汤和请求加载一个网站的HTML (例如https://en.wikipedia.org/wiki/Elephant)。我想模仿这个页面,但我想在'p‘标签(段落)中给句子着色。

为此,我用空格把课文分解成句子。我为感兴趣的人选择了一种颜色(一种基于二元深度学习分类器的概率颜色)。

代码语言:javascript
复制
def get_colorized_p(p):
    
    doc = nlp(p.text) # p is the beautiful soup p tag
    string = '<p>'
    for sentence in doc.sents:
        # The prediction value in anything within 0 to 1.
        prediction = classify(sentence.text, model=model, pred_values=True)[1][1].numpy()
        # I am using a custom function to map the prediction to a hex colour.
        color = get_hexcolor(prediction)
        string += f'<mark style="background: {color};">{sentence.text} </mark> '
    string += '</p>'
    return string # I create a new long string with the markup

我在p标记中创建了一个带有HTML标记的新长字符串。现在,我想替换美丽的汤对象中的“旧”元素。我用一个简单的循环来完成这个任务:

代码语言:javascript
复制
for element in tqdm_notebook(soup.findAll()):
    if element.name == 'p':
        if len(element.text.split()) > 2: 
            element = get_colorized_p(element)

但是,当我呈现HTML文件时,这不会产生任何错误。显示HTML文件时不使用标记。

我正在使用jupyter快速显示HTML文件:

代码语言:javascript
复制
from IPython.display import display, HTML

display(HTML(html_file))

然而,这是行不通的。我确实验证了get_colorized_p返回的字符串。当我在单个p元素上使用它并呈现它时,它工作得很好。但是我想把字符串插入到漂亮的汤对象中。

我希望任何人都能对这个问题有所了解。替换循环中的元素会出错。然而,我不知道如何修复它。

呈现字符串示例的示例,以防万一:

代码语言:javascript
复制
<p><mark style="background: #edf8fb;">Elephants are the largest existing land animals.</mark><mark style="background: #f1fafc;">Three living species are currently recognised: the African bush elephant, the African forest elephant, and the Asian elephant.</mark><mark style="background: #f3fafc;">They are an informal grouping within the proboscidean family Elephantidae.</mark><mark style="background: #f3fafc;">Elephantidae is the only surviving family of proboscideans; extinct members include the mastodons.</mark><mark style="background: #eff9fb;">Elephantidae also contains several extinct groups, including the mammoths and straight-tusked elephants.</mark><mark style="background: #68c3a6;">African elephants have larger ears and concave backs, whereas Asian elephants have smaller ears, and convex or level backs.</mark><mark style="background: #56ba91;">The distinctive features of all elephants include a long proboscis called a trunk, tusks, large ear flaps, massive legs, and tough but sensitive skin.</mark><mark style="background: #d4efec;">The trunk is used for breathing, bringing food and water to the mouth, and grasping objects.</mark><mark style="background: #e7f6f9;">Tusks, which are derived from the incisor teeth, serve both as weapons and as tools for moving objects and digging.</mark><mark style="background: #d9f1f0;">The large ear flaps assist in maintaining a constant body temperature as well as in communication.</mark><mark style="background: #e5f5f9;">The pillar-like legs carry their great weight.</mark><mark style="background: #72c7ad;"> </mark></p>
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-05-28 15:54:31

就像这个想法和配色方案--在我看来,主要问题是你试图用一个tag取代string,而你应该用一个bs4 object来调味你的soup,让它变得更有新意:

代码语言:javascript
复制
for element in tqdm_notebook(soup.find_all()):
    if element.name == 'p':
        if len(element.text.split()) > 2: 
            element.replace_with(BeautifulSoup(get_colorized_p(element), 'html.parser'))

soup转换回字符串并尝试显示它:

代码语言:javascript
复制
display(HTML(str(soup)))

在较新的代码中,避免使用旧语法findAll(),而是使用find_all() -要获得更多信息,请花一分钟时间讨论https://www.crummy.com/software/BeautifulSoup/bs4/doc/#method-names

示例

代码语言:javascript
复制
from bs4 import BeautifulSoup
from IPython.display import display, HTML

html = '''
    <p>Elephants are the largest ...</p>
'''
soup = BeautifulSoup(html, 'html.parser')

def get_colorized_p(element):
    ### processing and returning of result str
    return '<p><mark style="background: #edf8fb;">Elephants are the largest existing land animals.</mark><mark style="background: #f1fafc;">Three living species are currently recognised: the African bush elephant, the African forest elephant, and the Asian elephant.</mark><mark style="background: #f3fafc;">They are an informal grouping within the proboscidean family Elephantidae.</mark><mark style="background: #f3fafc;">Elephantidae is the only surviving family of proboscideans; extinct members include the mastodons.</mark><mark style="background: #eff9fb;">Elephantidae also contains several extinct groups, including the mammoths and straight-tusked elephants.</mark><mark style="background: #68c3a6;">African elephants have larger ears and concave backs, whereas Asian elephants have smaller ears, and convex or level backs.</mark><mark style="background: #56ba91;">The distinctive features of all elephants include a long proboscis called a trunk, tusks, large ear flaps, massive legs, and tough but sensitive skin.</mark><mark style="background: #d4efec;">The trunk is used for breathing, bringing food and water to the mouth, and grasping objects.</mark><mark style="background: #e7f6f9;">Tusks, which are derived from the incisor teeth, serve both as weapons and as tools for moving objects and digging.</mark><mark style="background: #d9f1f0;">The large ear flaps assist in maintaining a constant body temperature as well as in communication.</mark><mark style="background: #e5f5f9;">The pillar-like legs carry their great weight.</mark><mark style="background: #72c7ad;"> </mark></p>'

for element in soup.find_all():
    if element.name == 'p':
        if len(element.text.split()) > 2: 
            element.replace_with(BeautifulSoup(get_colorized_p(element), 'html.parser'))

display(HTML(str(soup)))

不完全相同,但非常接近你问题中的行为:

票数 1
EN

Stack Overflow用户

发布于 2022-05-28 15:20:18

element = get_colorized_p(element)分配一个局部变量,该局部变量从此不再被for-循环变量使用/覆盖。您需要保存已处理的元素,例如将它们连接到一个字符串中。

代码语言:javascript
复制
html = ''
for element in tqdm_notebook(soup.findAll()):
    if element.name == 'p' and len(element.text.split()) > 2: 
        html += get_colorized_p(element)
    else:
        html += element.text

display(HTML(html))
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72416799

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档