我能够成功地从一个网站中提取数据,除了一个字段,它的标签是img。以下是代码:
#import pandas as pd
import re
from urllib2 import urlopen
from bs4 import BeautifulSoup
# gets a file-like object using urllib2.urlopen
url = 'http://ecal.forexpros.com/e_cal.php?duration=daily'
html = urlopen(url)
soup = BeautifulSoup(html)
# loops over all <tr> elements with class 'ec_bg1_tr' or 'ec_bg2_tr'
for tr in soup.find_all('tr', {'class': re.compile('ec_bg[12]_tr')}):
# finds desired data by looking up <td> elements with class names
event = tr.find('td', {'class': 'ec_td_event'}).text
currency = tr.find('td', {'class': 'ec_td_currency'}).text
actual = tr.find('td', {'class': 'ec_td_actual'}).text
forecast = tr.find('td', {'class': 'ec_td_forecast'}).text
previous = tr.find('td', {'class': 'ec_td_previous'}).text
time = tr.find('td', {'class': 'ec_td_time'}).text
importance = tr.find('td', {'class': 'ec_td_importance'}).text
# the returned strings are unicode, so to print them we need a unicode string
print u'{:3}\t{}\t{:5}\t{:8}\t{:8}\t{:8}\t{}'.format(currency, importance, time, actual, forecast, previous, event)输出的前几个记录如下:
JPY 01:00 43.8 43.6 43.3 Household Confidence
CHF 01:45 -3 -3 -8 SECO Consumer Climate
RON 02:00 2.50% 3.30% PPI (YoY)
EUR 03:00 -26.9K -66.5K -98.3K Spanish Unemployment Change
CHF 03:15 1.5% 1.3% -0.8% Retail Sales (YoY)
CHF 03:30 60.9 58.9 60.1 SVME PMI
GBP 04:30 51.9 54.5 54.8 Construction PMIimportance字段没有显示在上面的输出中(大概是因为数据包含在img alt中)。
有人知道怎么解决这个问题吗?
谢谢!
编辑:
解决这一问题的办法是:
importance = tr.find('td', {'class': 'ec_td_importance'}).text通过以下方式:
importance = tr.find('td', {'class': 'ec_td_importance'}).img.get('alt')发布于 2017-08-02 18:32:17
替换下面的importance行:
importance = tr.find('td', {'class': 'ec_td_importance'}).img['alt']https://stackoverflow.com/questions/45467786
复制相似问题