为了好玩,我试图从我的雅虎梦幻橄榄球联盟中收集一些数据,以便进行球员交易。这是我第一次使用机械化和漂亮的汤,我在打印特定数据时遇到了麻烦。我希望提取的是球员的名字,如果他们被添加到‘豁免’和日期。我能弄到第一部分,但我不确定怎么弄到日期。第一部分是HTML示例,第二部分是我的代码:
<table class="Table Table-mid Tst-transaction-table">
<tr>
<td class="Grid-u-1-12 Ta-c"><span class="F-icon Block Fz-lg F-positive Cur-h" title="Added Player"></span><span class="F-icon Block Fz-lg F-negative Ptop-med Cur-h" title="Dropped Player"></span></td>
<td class="Fill-x No-pstart" colspan="2">
<div class="Pbot-xs"> <a href="http://sports.yahoo.com/nfl/players/24963" target=sports onclick="pop(this)">Dwayne Harris</a>
<span class="F-position Fz-xxs">NYG - WR</span>
<a href="http://sports.yahoo.com/nfl/players/24963/news" class="yfa-icon playernote playernote-recent" data-ys-playerid="24963" data-ys-playernote-view="notes" target="_blank" id="playernote-'.24963.'"></a> <h6 class="F-shade Fz-xxs"> Waiver </h6></div>
<div class="Pbot-xs"> <a href="http://sports.yahoo.com/nfl/players/6791" target=sports onclick="pop(this)">Benjamin Watson</a>
<span class="F-position Fz-xxs">NO - TE</span>
<a href="http://sports.yahoo.com/nfl/players/6791/news" class="yfa-icon playernote playernote-recent" data-ys-playerid="6791" data-ys-playernote-view="notes" target="_blank" id="playernote-'.6791.'"></a> <h6 class="F-shade Fz-xxs"> To Waivers</h6></div>
</td>
<td class="Ta-end">
<div class="Grid-h-top Nowrap Fz-xxs">
<span class="Grid-u">
<a class="Tst-team-name" href="/f1/313652/10">TeamName2</a>
<span class="Block F-timestamp Fz-xxs Nowrap">Nov 20, 4:03 am</span>
</span>
<a class='Grid-u' href='/f1/313652/10'><img class="Avatar-sm Mstart-med Grid-u" src="http://l.yimg.com/dh/ap/fantasy/nfl/img/icon_01_100.png" alt="avatar"> </a>
</div>
</td>
</tr> <tr>
<td class="Grid-u-1-12 Ta-c"><span class="F-icon Block Fz-lg F-positive Cur-h" title="Added Player"></span><span class="F-icon Block Fz-lg F-negative Ptop-med Cur-h" title="Dropped Player"></span></td>
<td class="Fill-x No-pstart" colspan="2">
<div class="Pbot-xs"> <a href="http://sports.yahoo.com/nfl/players/7306" target=sports onclick="pop(this)">Darren Sproles</a>
<span class="F-position Fz-xxs">Phi - RB</span>
<a href="http://sports.yahoo.com/nfl/players/7306/news" class="yfa-icon playernote playernote-recent" data-ys-playerid="7306" data-ys-playernote-view="notes" target="_blank" id="playernote-'.7306.'"></a> <h6 class="F-shade Fz-xxs">Free Agent </h6></div>
<div class="Pbot-xs"> <a href="http://sports.yahoo.com/nfl/players/24262" target=sports onclick="pop(this)">Joique Bell</a>
<span class="F-position Fz-xxs">Det - RB</span>
<span class="F-injury Fz-xxs" title="Probable">P</span>
<a href="http://sports.yahoo.com/nfl/players/24262/news" class="yfa-icon playernote playernote-old" data-ys-playerid="24262" data-ys-playernote-view="notes" target="_blank" id="playernote-'.24262.'"></a> <h6 class="F-shade Fz-xxs"> To Waivers</h6></div>
</td>
<td class="Ta-end">
<div class="Grid-h-top Nowrap Fz-xxs">
<span class="Grid-u">
<a class="Tst-team-name" href="/f1/313652/3">TeamName1</a>
<span class="Block F-timestamp Fz-xxs Nowrap">Nov 19, 1:30 pm</span>
</span>
<a class='Grid-u' href='/f1/313652/3'><img class="Avatar-sm Mstart-med Grid-u" src="http://l.yimg.com/dh/ap/fantasy/img/profile_48.png" alt="avatar"> </a>
</div>
</td>代码:
import mechanize
from bs4 import BeautifulSoup
import urllib
username = 'my-username'
password = 'my-password'
br = mechanize.Browser()
br.addheaders = [('User-agent', 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.6')]
br.open("https://football.fantasysports.yahoo.com/f1/313652/transactions")
br.select_form(nr=0)
br.form["username"] = username
br.form["passwd"] = password
response = br.submit()
html_scrape = response.read()
soup = BeautifulSoup(html_scrape, "lxml")
for lines in soup.find_all('div', attrs={'class': 'Pbot-xs'}):
players = lines.find('a').get_text()
status = lines.find('h6').get_text()
if (status == ' To Waivers'):
print "%s was dropped" % players我尝试在Table上使用find()函数,但我不知道如何获取要查找的文本数据。
谢谢!
发布于 2015-11-21 01:08:44
在不熟悉雅虎梦幻足球页面的情况下,很难给你一个正确的答案,但我可以告诉你,如果你试图将BeautifulSoup指向特定的div,你应该像这样使用BeautifulSoup的select功能:
for lines in soup.select("div#pBot-xs"):
players = lines.find('a').text
status = lines.find('h6').text
if status == 'To Waivers':
print "%s was dropped." % players发布于 2015-11-21 03:10:50
这有点棘手,因为在梦幻体育中,你可以删除一个球员,但不一定要添加一个球员。我通过浏览列表并按顺序添加球员姓名和日期来实现这一点。玩家必须匹配“to Waivers”属性。然后,我设置了try/catch块,以确保迭代中的前一个对象具有相应的播放器。这将确保我的字典值是Player > Date > Player > Date等
然后我迭代了字典,并以我想要的方式格式化了打印结果:
import mechanize
from bs4 import BeautifulSoup
import urllib
username = 'username@yahoo.com'
password = 'password'
br = mechanize.Browser()
br.addheaders = [('User-agent', 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.6')]
br.open("https://football.fantasysports.yahoo.com/f1/313652/transactions")
br.select_form(nr=0)
br.form["username"] = username
br.form["passwd"] = password
response = br.submit()
html_scrape = response.read()
soup = BeautifulSoup(html_scrape, "lxml")
index = 1
dropped = {}
for players in soup.select("table > tr > td > div"):
player = players.find('a').get_text()
try:
if (players.find('h6').get_text() == ' To Waivers' ):
dropped[index] = player
except AttributeError:
pass
time = players.find('span',{'class':"Block F-timestamp Fz-xxs Nowrap"})
if (time != None):
try:
nullplayer = dropped[index - 1]
time = time.get_text()
dropped[index] = time
except KeyError:
pass
index += 1
count = 1
for items in dropped:
if (count % 2 == 0):
player = dropped[items - 1]
time = dropped[items]
print "%s dropped on %s" %(player, time)
count += 1https://stackoverflow.com/questions/33832138
复制相似问题