文章/答案/技术大牛

发布

问用img alt标签提取文本
EN

Stack Overflow用户

提问于 2017-08-02 17:50:58

回答 1查看 951关注 0票数 0

我能够成功地从一个网站中提取数据，除了一个字段，它的标签是img。以下是代码：

#import pandas as pd
import re
from urllib2 import urlopen
from bs4 import BeautifulSoup

# gets a file-like object using urllib2.urlopen
url = 'http://ecal.forexpros.com/e_cal.php?duration=daily'
html = urlopen(url)

soup = BeautifulSoup(html)

# loops over all <tr> elements with class 'ec_bg1_tr' or 'ec_bg2_tr'
for tr in soup.find_all('tr', {'class': re.compile('ec_bg[12]_tr')}):
    # finds desired data by looking up <td> elements with class names
    event = tr.find('td', {'class': 'ec_td_event'}).text
    currency = tr.find('td', {'class': 'ec_td_currency'}).text
    actual = tr.find('td', {'class': 'ec_td_actual'}).text
    forecast = tr.find('td', {'class': 'ec_td_forecast'}).text
    previous = tr.find('td', {'class': 'ec_td_previous'}).text
    time = tr.find('td', {'class': 'ec_td_time'}).text
    importance = tr.find('td', {'class': 'ec_td_importance'}).text

    # the returned strings are unicode, so to print them we need a unicode string
    print u'{:3}\t{}\t{:5}\t{:8}\t{:8}\t{:8}\t{}'.format(currency, importance, time, actual, forecast, previous, event)

输出的前几个记录如下：

JPY     01:00   43.8        43.6        43.3        Household Confidence 
CHF     01:45   -3          -3          -8          SECO Consumer Climate 
RON     02:00   2.50%                   3.30%       PPI (YoY) 
EUR     03:00   -26.9K      -66.5K      -98.3K      Spanish Unemployment Change 
CHF     03:15   1.5%        1.3%        -0.8%       Retail Sales (YoY) 
CHF     03:30   60.9        58.9        60.1        SVME PMI 
GBP     04:30   51.9        54.5        54.8        Construction PMI

importance字段没有显示在上面的输出中(大概是因为数据包含在img alt中)。

有人知道怎么解决这个问题吗？

谢谢!

编辑：

解决这一问题的办法是：

importance = tr.find('td', {'class': 'ec_td_importance'}).text

通过以下方式：

importance = tr.find('td', {'class': 'ec_td_importance'}).img.get('alt')

python

beautifulsoup

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-08-02 18:32:17

替换下面的importance行：

importance = tr.find('td', {'class': 'ec_td_importance'}).img['alt']

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/45467786

复制

相似问题

问用img alt标签提取文本
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用img alt标签提取文本EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用img alt标签提取文本
EN