我刚发现美汤,它看上去很有力量。我想知道是否有一种简单的方法来提取文本中的"alt“字段。一个简单的例子是
from bs4 import BeautifulSoup
html_doc ="""
<body>
<p>Among the different sections of the orchestra you will find:</p>
<p>A <img src="07fg03-violin.jpg" alt="violin" /> in the strings</p>
<p>A <img src="07fg03-trumpet.jpg" alt="trumpet" /> in the brass</p>
<p>A <img src="07fg03-woodwinds.jpg" alt="clarinet and saxophone"/> in the woodwinds</p>
</body>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
print(soup.get_text())这将导致
在管弦乐队的不同部分中,你会发现:
A在琴弦中
黄铜甲
在木管乐器里
但是,我希望在文本提取中使用alt字段,这将给出
在管弦乐队的不同部分中,你会发现:
琴弦中的小提琴
黄铜喇叭
木管乐器中的单簧管和萨克斯管
谢谢
发布于 2017-04-24 14:41:00
请考虑这一做法。
from bs4 import BeautifulSoup
html_doc ="""
<body>
<p>Among the different sections of the orchestra you will find:</p>
<p>A <img src="07fg03-violin.jpg" alt="violin" /> in the strings</p>
<p>A <img src="07fg03-trumpet.jpg" alt="trumpet" /> in the brass</p>
<p>A <img src="07fg03-woodwinds.jpg" alt="clarinet and saxophone"/> in the woodwinds</p>
</body>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
ptag = soup.find_all('p') # get all tags of type <p>
for tag in ptag:
instrument = tag.find('img') # search for <img>
if instrument: # if we found an <img> tag...
# ...create a new string with the content of 'alt' in the middle if 'tag.text'
temp = tag.text[:2] + instrument['alt'] + tag.text[2:]
print(temp) # print
else: # if we haven't found an <img> tag we just print 'tag.text'
print(tag.text)输出是
Among the different sections of the orchestra you will find:
A violin in the strings
A trumpet in the brass
A clarinet and saxophone in the woodwinds战略是:
<p>标记<img>标记中搜索<p>标记<img>标记,将其alt属性的内容插入到tag.text中并打印出来<img>标签,就打印出来发布于 2017-04-24 04:12:38
a = soup.findAll('img')
for every in a:
print(every['alt'])这样就能完成任务了。
1.行查找所有IMG (我们使用了.find all )
或为课文
print (a.text)
for eachline in a:
print(eachline.text)简单的循环,通过每一个结果或手动soup.findAll('img')[0],然后soup.findAll('img')[1]。诸若此类
https://stackoverflow.com/questions/43579438
复制相似问题