文章/答案/技术大牛

发布

社区首页 >问答首页 >美女汤的“发现”行为不一致(bs4)

问美女汤的“发现”行为不一致(bs4)
EN

Stack Overflow用户

提问于 2015-06-25 18:05:17

回答 1查看 94关注 0票数 1

我正在抓取NFL的网站上的球员统计数据。在解析网页和试图访问HTML表时，我遇到了一个问题，该表包含了我正在寻找的实际信息。我成功地下载了该页面，并将其保存到我正在工作的目录中。作为参考，我保存的页面可以找到这里。

# import relevant libraries
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup

soup = BeautifulSoup(open("1998.html"))
result = soup.find(id="result")
print result

我发现有一次，我运行了代码，结果打印了正确的表，我正在寻找。每隔一次，它不包含任何东西！我假设这是用户错误，但我不知道我错过了什么。使用"lxml“不会返回任何内容，而且我无法让html5lib工作(解析库？？)。

任何帮助都是非常感谢的！

python

python-2.7

beautifulsoup

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-06-25 18:25:50

首先，在将文件传递给BeautifulSoup之前，您应该读取文件的内容。

soup = BeautifulSoup(open("1998.html").read())

其次，通过将内容打印到屏幕上，手动验证存在问题的table。.prettify()方法使数据更易于读取。

print soup.prettify()

最后，如果元素确实存在，以下内容将能够找到它：

table = soup.find('table',{'id':'result'})

我编写的一个简单的测试脚本无法重现您的结果。

import urllib
from bs4 import BeautifulSoup

def test():
    # The URL of the page you're scraping.
    url = 'http://www.nfl.com/stats/categorystats?tabSeq=0&statisticCategory=PASSING&conference=null&season=1998&seasonType=REG&d-447263-s=PASSING_YARDS&d-447263-o=2&d-447263-n=1'

    # Make a request to the URL.
    conn = urllib.urlopen(url)

    # Read the contents of the response
    html = conn.read()

    # Close the connection.
    conn.close()

    # Create a BeautifulSoup object and find the table.
    soup = BeautifulSoup(html)
    table = soup.find('table',{'id':'result'})

    # Find all rows in the table.
    trs = table.findAll('tr')

    # Print to screen the number of rows found in the table.
    print len(trs)

这将每次输出51。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/31057586

复制

相似问题

问美女汤的“发现”行为不一致(bs4)
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问美女汤的“发现”行为不一致(bs4)EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问美女汤的“发现”行为不一致(bs4)
EN