文章/答案/技术大牛

发布

社区首页 >问答首页 >如何用“美汤”文本提取"alt“

问如何用“美汤”文本提取"alt“
EN

Stack Overflow用户

提问于 2017-04-24 03:42:17

回答 2查看 2.2K关注 0票数 2

我刚发现美汤，它看上去很有力量。我想知道是否有一种简单的方法来提取文本中的"alt“字段。一个简单的例子是

from bs4 import BeautifulSoup

html_doc ="""
<body>
<p>Among the different sections of the orchestra you will find:</p>
<p>A <img src="07fg03-violin.jpg" alt="violin" /> in the strings</p>
<p>A <img src="07fg03-trumpet.jpg" alt="trumpet"  /> in the brass</p>
<p>A <img src="07fg03-woodwinds.jpg" alt="clarinet and saxophone"/> in the woodwinds</p>
</body>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
print(soup.get_text())

这将导致

在管弦乐队的不同部分中，你会发现：

A在琴弦中

黄铜甲

在木管乐器里

但是，我希望在文本提取中使用alt字段，这将给出

在管弦乐队的不同部分中，你会发现：

琴弦中的小提琴

黄铜喇叭

木管乐器中的单簧管和萨克斯管

谢谢

python

beautifulsoup

alt

回答 2

Stack Overflow用户

回答已采纳

发布于 2017-04-24 14:41:00

请考虑这一做法。

from bs4 import BeautifulSoup

html_doc ="""
<body>
<p>Among the different sections of the orchestra you will find:</p>
<p>A <img src="07fg03-violin.jpg" alt="violin" /> in the strings</p>
<p>A <img src="07fg03-trumpet.jpg" alt="trumpet"  /> in the brass</p>
<p>A <img src="07fg03-woodwinds.jpg" alt="clarinet and saxophone"/> in the woodwinds</p>
</body>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
ptag = soup.find_all('p')   # get all tags of type <p>

for tag in ptag:
    instrument = tag.find('img')    # search for <img>
    if instrument:  # if we found an <img> tag...
        # ...create a new string with the content of 'alt' in the middle if 'tag.text'
        temp = tag.text[:2] + instrument['alt'] + tag.text[2:]
        print(temp) # print
    else:   # if we haven't found an <img> tag we just print 'tag.text'
        print(tag.text)

输出是

Among the different sections of the orchestra you will find:
A violin in the strings
A trumpet in the brass
A clarinet and saxophone in the woodwinds

战略是：

查找所有<p>标记
在这些<img>标记中搜索<p>标记
如果我们找到和<img>标记，将其alt属性的内容插入到tag.text中并打印出来
如果我们找不到<img>标签，就打印出来

票数 2

Stack Overflow用户

发布于 2017-04-24 04:12:38

a = soup.findAll('img')

for every in a:
    print(every['alt'])

这样就能完成任务了。

1.行查找所有IMG (我们使用了.find all )

或为课文

print (a.text)
for eachline in a:
    print(eachline.text)

简单的循环，通过每一个结果或手动soup.findAll('img')[0]，然后soup.findAll('img')[1]。诸若此类

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/43579438

复制

相似问题

问如何用“美汤”文本提取"alt“
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何用“美汤”文本提取"alt“EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何用“美汤”文本提取"alt“
EN