文章/答案/技术大牛

发布

社区首页 >问答首页 >使用请求和beautifulsoup4从<tbody>获取数据

问使用请求和beautifulsoup4从<tbody>获取数据
EN

Stack Overflow用户

提问于 2018-05-28 22:04:15

回答 1查看 348关注 0票数 0

为了更好地学习使用beautifulsoup4，我正在尝试从https://semlar.com/rivenprices/artax获取一些数据(当然，我现在和将来只会将这些数据用于学习目的，以避免任何潜在的法律问题。我在这里发布的所有数据对使用“检查”浏览器功能的每个人都是可用的)。

这个网站显示了游戏Warframe中特殊mod的平均价格，但这超出了重点。我想写一个应用程序，采用mod名称(例如Artax，Lanka等)并打印"Avg Price“和"Dispo”值。

下面是我想要获取数据的一小部分表格的链接：https://imggmi.com/full/2018/5/28/daa550ff5f042bb80ab0ecdd980a3935-full.png.html

我以前做过这样的应用程序，但在这里我遇到了一个问题-武器的名称，价格和“处置”似乎隐藏在tbody标签下，当我用bs4搜索数据时，这个标签是空的。

到目前为止，我的程序：

import requests
import bs4

url = requests.get('https://semlar.com/rivenprices/artax').text
soup = bs4.BeautifulSoup(url, 'html.parser')
data = soup.find(class_='table')

在本例中，data为：

<table class="table" id="riven-table">
<thead>
<tr>
<th>Riven Name</th>
<th class="price-avg">Avg Price</th>
<th class="riven-disposition">Dispo</th>
</tr>
</thead>
<tbody>
</tbody>
</table>

如你所见，<tbody>标签是空的，但是当你在浏览器中检查表格中的任何元素时，它似乎就在这个标签中，在<tbody><tr><td>下面--这是一个截图，显示了被检查的代码的一部分：

https://imggmi.com/full/2018/5/28/0619e4d1944c0291bfa70a30678b3f51-full.png.html

python-3.x

beautifulsoup

python-requests

回答 1

Stack Overflow用户

发布于 2018-05-29 04:36:07

你知道他们怎么说的吗:如果他们把你扔出门外，就从窗户回来。

我设法以完全不同的方式做了我想做的事情，我称之为“蛮力-非安全和可靠”的方式。程序：

打开web浏览器，等待固定时间直到网页加载，自动按Ctrl+A，再次等待片刻，按Ctrl+C并关闭浏览器(如果浏览器卡很多，则关闭浏览器卡)。在这里我使用了webbrowser，time和pywinauto modules.
Pastes的剪贴板到rawdata.txt文件。我这样做只是为了确保在我通过随机复制一些文本来一遍又一遍地编写和运行代码时，不会搞砸我的测试。我使用pyperclip来做这件事。随后打开该文件，并对内容进行格式化，以创建{'weapon_name': ['mod_price', 'disposition'], 'next_weapon_name: [...], ...}.
At格式的字典。最后，程序询问用户他想要检查的武器名称，并为用户提供字典中的数据。然后，他可以再次运行循环，询问其他枪的情况，或者直接结束程序。

代码：

from time import sleep
import webbrowser
import pywinauto.keyboard as pkbd
import pyperclip

url = 'https://semlar.com/rivenprices/lanka'


def greet():
    print("This app will get data from {}".format(url))
    print("You will be able to check riven mod price and disposition for desired weapon.")


def open_browser_get_to_clipboard():
    webbrowser.open(url)
    sleep(10)
    pkbd.SendKeys('^a')
    sleep(2)
    pkbd.SendKeys('^c')
    sleep(1)
    pkbd.SendKeys('%{F4}')


def write_to_file(fname):
    with open(fname, 'w+') as fin:
        fin.write(pyperclip.paste())


def format_data_from_file(fname):
    riven_database = dict()
    with open(fname, 'r') as fout:
        data = fout.read().split('\n')
        start_ind = data.index('Riven Name\tAvg Price\tDispo')
        formatted_data = data[start_ind + 1:]
        formatted_data = list(filter(None, formatted_data))

        for item in formatted_data:
            temp = item.split('\t')
            riven_database.update({temp[0]: [temp[1], temp[2]]})

        return riven_database


def ask_user_and_check(riven_dict):
    print("Which weapon would you like to look up for?")
    while True:
        decision = input(">>> ")
        if decision.upper() not in riven_dict.keys():
            print("Weapon name not found. Try again.")
            decision = input(">>> ")
        else:
            print("You have picked {} weapon to check.".format(decision.upper()))
            break

    return decision.upper()


def print_output(decision, riven_dict):
    print("Name of the weapon: {}".format(decision))
    print("Average riven mod price: {} platinum".format(riven_dict[decision][0]))
    print("Riven disposition of picked weapon: {}".format(riven_dict[decision][1]))


def quit_or_loop_again():
    print("Do you want to search again or quit?")
    print("To search again input any character, to quit input [x] or [X].")
    decision = input(">>> ")
    if decision in ['x', 'X']:
        print('Good bye.')
    else:
        main()


def main():
    greet()
    open_browser_get_to_clipboard()
    write_to_file('rawdata.txt')
    riven_database = format_data_from_file('rawdata.txt')
    decision = ask_user_and_check(riven_database)
    print_output(decision, riven_database)
    quit_or_loop_again()


if __name__ == '__main__':
    main()

我想我就把它留在这里，也许有人会从它里面拿出一些东西，考虑到这一切是如何工作的，我对此表示怀疑。它肯定不美观，如果用户执行任何操作，如关闭浏览器或更改浏览器标签，都会失败。尽管如此，我还是做了一些工作，我有点自豪，我甚至以这种方式做到了。我使用了非常有趣的模块，并在此过程中学到了新的东西，我想这就是我想要的。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50568186

复制

相似问题

问使用请求和beautifulsoup4从<tbody>获取数据
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用请求和beautifulsoup4从<tbody>获取数据EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用请求和beautifulsoup4从<tbody>获取数据
EN