我正在使用从以下站点获取一些信息:http://www.ukathletics.com/schedule-list/#!/m-basebl/2016
我感兴趣的是一些链接,日期和团队名称。我已经编写了下面的代码来标识我正在寻找的正确信息,但是它似乎只获取到某个点的信息,然后将空项添加到我的列表中(即'')。
我知道,所有的名单应该有66个项目,如果拉得正确(肯塔基州玩了66场)。你知道为什么在第二场LSU游戏之后它就停止提取信息了吗?
bs = [] #boxscores
team2 = [] #opponents
dates = [] #dates of games
team1 = 'KENTUCKY' #team of interest
driver = webdriver.Chrome()
driver.get('http://www.ukathletics.com/schedule-list/#!/m-basebl/2016')
elem = driver.find_elements_by_class_name('event_link')
for i in elem:
bs.append(i.get_attribute('href'))
links = sorted(set(bs), key=lambda x: bs.index(x))
elem = driver.find_elements_by_class_name('school_name')
team2 = [i.text for i in elem if i.text!=team1]
elem = driver.find_elements_by_class_name('date')
for i in elem:
dates.append(i.text.replace(',','').replace('\n',' '))
print(links)
print(team2)
print(dates)
print(len(links))
print(len(team2))
print(len(dates))我的研究结果:
['http://www.ukathletics.com/game-center/580644ebe4b07dac0ca58a91/', 'http://www.ukathletics.com/game-center/5806455ce4b07dac0ca58a92/', 'http://www.ukathletics.com/game-center/58064594e4b09266491b651d/', 'http://www.ukathletics.com/game-center/5820d9dbe4b0493932cf30fd/', 'http://www.ukathletics.com/game-center/5820da33e4b0493932cf30fe/', 'http://www.ukathletics.com/game-center/5820da86e4b05e67c64470ca/', 'http://www.ukathletics.com/game-center/5820dabde4b0493932cf30ff/', 'http://www.ukathletics.com/game-center/5820daf4e4b05e67c64470cb/', 'http://www.ukathletics.com/game-center/5820db25e4b05e67c64470cc/', 'http://www.ukathletics.com/game-center/5820db6ce4b0493932cf3100/', 'http://www.ukathletics.com/game-center/5820db91e4b05e67c64470de/', 'http://www.ukathletics.com/game-center/5820dbb6e4b05e67c64470df/', 'http://www.ukathletics.com/game-center/5820dbe3e4b0493932cf3101/', 'http://www.ukathletics.com/game-center/5820dc0de4b05e67c64470e0/', 'http://www.ukathletics.com/game-center/58c1e98ee4b066e02ca82086/', 'http://www.ukathletics.com/game-center/5820dc32e4b05e67c64470e1/', 'http://www.ukathletics.com/game-center/5820dc80e4b0493932cf3102/', 'http://www.ukathletics.com/game-center/5820dcaae4b0493932cf3103/', 'http://www.ukathletics.com/game-center/5820dd1ee4b0493932cf3104/', 'http://www.ukathletics.com/game-center/5820dd6fe4b0493932cf3105/', 'http://www.ukathletics.com/game-center/5820dd8ce4b05e67c64470e3/', 'http://www.ukathletics.com/game-center/5820de21e4b05e67c64470e4/', 'http://www.ukathletics.com/game-center/5820de47e4b0493932cf3106/', 'http://www.ukathletics.com/game-center/5820de69e4b05e67c64470e5/', 'http://www.ukathletics.com/game-center/5820de87e4b0493932cf3107/', 'http://www.ukathletics.com/game-center/5820dea9e4b05e67c64470e6/', 'http://www.ukathletics.com/game-center/5820decee4b0493932cf3108/', 'http://www.ukathletics.com/game-center/5820deebe4b05e67c64470e7/', 'http://www.ukathletics.com/game-center/5820df0ce4b05e67c64470e8/', 'http://www.ukathletics.com/game-center/5820df50e4b0493932cf3114/', 'http://www.ukathletics.com/game-center/5820df85e4b05e67c64470e9/', 'http://www.ukathletics.com/game-center/5820dfa9e4b05e67c64470ea/', 'http://www.ukathletics.com/game-center/5820dfc7e4b05e67c64470eb/', 'http://www.ukathletics.com/game-center/5820dfebe4b0493932cf3115/', 'http://www.ukathletics.com/game-center/5820e023e4b0493932cf3116/', 'http://www.ukathletics.com/game-center/5820e03ee4b0493932cf3117/', 'http://www.ukathletics.com/game-center/5820e056e4b0493932cf3118/', 'http://www.ukathletics.com/game-center/5820e089e4b0493932cf3119/', 'http://www.ukathletics.com/game-center/5820e0bee4b05e67c64470ed/', 'http://www.ukathletics.com/game-center/5820e0a4e4b05e67c64470ec/']
['NORTH CAROLINA', 'NORTH CAROLINA', 'NORTH CAROLINA', 'LIBERTY', "ST. JOSEPH'S", 'OLD DOMINION', 'DELAWARE', 'E. KENTUCKY', 'WKU', 'UC SANTA BARBARA', 'UC SANTA BARBARA', 'UC SANTA BARBARA', 'WRIGHT STATE', 'CINCINNATI', 'MIAMI (OH)', 'MIAMI (OH)', 'MIAMI (OH)', 'MURRAY STATE', 'TEXAS A&M', 'TEXAS A&M', 'TEXAS A&M', 'WKU', 'OLE MISS', 'OLE MISS', 'OLE MISS', 'CINCINNATI', 'VANDERBILT', 'VANDERBILT', 'VANDERBILT', 'LOUISVILLE', 'MISSISSIPPI STATE', 'MISSISSIPPI STATE', 'MISSISSIPPI STATE', 'UT MARTIN', 'MIZZOU', 'MIZZOU', 'MIZZOU', 'LOUISVILLE', 'LSU', 'LSU', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
['FRI FEB 17', 'SAT FEB 18', 'SUN FEB 19', 'WED FEB 22', 'FRI FEB 24', 'SAT FEB 25', 'SUN FEB 26', 'TUE FEB 28', 'WED MAR 1', 'FRI MAR 3', 'SAT MAR 4', 'SUN MAR 5', 'TUE MAR 7', 'WED MAR 8', 'THU MAR 9', 'FRI MAR 10', 'SUN MAR 12', 'TUE MAR 14', 'FRI MAR 17', 'SAT MAR 18', 'SUN MAR 19', 'TUE MAR 21', 'THU MAR 23', 'FRI MAR 24', 'SAT MAR 25', 'TUE MAR 28', 'FRI MAR 31', 'SAT APR 1', 'SUN APR 2', 'TUE APR 4', 'FRI APR 7', 'SAT APR 8', 'SUN APR 9', 'WED APR 12', 'FRI APR 14', 'SAT APR 15', 'SUN APR 16', 'TUE APR 18', 'FRI APR 21', 'FRI APR 21', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
40
120
80发布于 2017-08-09 11:58:55
实际上,所有的元素都没有被获取,因为它们没有被加载。如果您仔细观察表的底部元素,只在页面末尾向下滚动时加载。
您可以尝试在打开页面后添加下面的代码,以便加载完整的表。
driver = webdriver.Chrome()
driver.get('http://www.ukathletics.com/schedule-list/#!/m-basebl/2016')
time.sleep(5)
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + Keys.END)
time.sleep(5)
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL +Keys.END)我对它进行了测试,并给出了以下输出:
66 #print(len(links))
198 #print(len(team2))
132 #print(len(dates))https://stackoverflow.com/questions/45574498
复制相似问题