首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >蒸汽市场解析

蒸汽市场解析
EN

Stack Overflow用户
提问于 2020-05-28 02:59:23
回答 2查看 1.7K关注 0票数 0

我有一个链接

并且它有"_price_asc“在最后,它进行升序排序。当我在浏览器中遵循这个链接时,排序工作得很好。

但!如果我尝试使用bs4解析项目链接,这会给出价格随机的项目,即升序排序不起作用。

我做错什么了?

代码语言:javascript
复制
from urllib.request import urlopen
from bs4 import BeautifulSoup

link = 'https://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife&appid=730#p1_price_asc'

total_links = ''

page = urlopen(link)
bs_page = BeautifulSoup(page.read(), features="html.parser")
objects = bs_page.findAll(class_="market_listing_row_link")

for g in range(10):
    total_links += str(objects[g]["href"]) + '\n'
print(total_links)
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-05-28 03:32:29

此页面使用JavaScript获取排序数据,但BeautifulSoup/urllib无法运行JavaScript

但是使用Firefox/Chrome中的Firefox/Chrome(选项卡:Network,filter:XHR),我发现JavaScript从某些url中读取JSON数据,并且有带有排序数据的HTML --所以您可以使用这个url与BeautifulSoup一起获取排序数据。

代码语言:javascript
复制
from urllib.request import urlopen
from bs4 import BeautifulSoup
import json

# new url 

link = 'https://steamcommunity.com/market/search/render/?query=&start=0&count=10&search_descriptions=0&sort_column=price&sort_dir=asc&appid=730&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife'

page = urlopen(link)

data = json.loads(page.read().decode())
html = data['results_html']

bs_page = BeautifulSoup(html, features="html.parser")
objects = bs_page.findAll(class_="market_listing_row_link")

data = []

for g in objects:
    link  = g["href"]
    price = g.find('span', {'data-price': True}).text
    data.append((price, link))

print("\n".join(f"{price} | {link}" for price, link in data))

结果:

代码语言:javascript
复制
$67.43 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Urban%20Masked%20%28Field-Tested%29
$67.70 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Night%20Stripe%20%28Field-Tested%29
$69.00 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Night%20Stripe%20%28Minimal%20Wear%29
$69.52 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Scorched%20%28Battle-Scarred%29
$69.48 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Safari%20Mesh%20%28Field-Tested%29
$70.32 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Forest%20DDPAT%20%28Battle-Scarred%29
$70.90 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Night%20Stripe%20%28Well-Worn%29
$70.52 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Forest%20DDPAT%20%28Field-Tested%29
$71.99 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Boreal%20Forest%20%28Field-Tested%29
$72.08 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Scorched%20%28Field-Tested%29

BTW:是我的第一个版本,它从旧的url中读取并用进行排序。但它只能对第一页上的数据进行排序。为了获得更好的结果,它将必须阅读所有的网页-这将需要很多时间。

代码语言:javascript
复制
from urllib.request import urlopen
from bs4 import BeautifulSoup

link = 'https://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife&appid=730#p1_price_asc'
page = urlopen(link)

bs_page = BeautifulSoup(page.read(), features="html.parser")
objects = bs_page.findAll(class_="market_listing_row_link")

data = []

for g in objects:
    link  = g["href"]
    price = g.find('span', {'data-price': True})['data-price']
    price = int(price)
    data.append((price,link))

data = sorted(data)

print("\n".join(f"${price/100} USD | {link}" for price, link in data))
票数 1
EN

Stack Overflow用户

发布于 2020-05-28 03:17:49

之所以会发生这种情况,是因为如果您查看以下链接

代码语言:javascript
复制
https://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife&appid=730#p1_price_asc

链接以"#p1_price_asc“结尾,hashtag是各种页面标记的指示符,这里是一个链接,它给出了一个完整的解释。基本上,url中的"#“通常由javascript函数调用。

由于您正在下载该页,因此使用:

代码语言:javascript
复制
page = urlopen(link)

这不会导致执行排序的javascript函数调用。我强烈推荐hashtag上的链接,因为这样做比我解释的要好得多。

现在,关于如何实现你想要的,你有两个选择:

  1. 使用selenium库,因为它模拟浏览器
  2. 继续使用您正在使用的数据,并手动对数据进行排序(它非常琐碎,您将学到更多)。

我个人推荐方法2,因为学习硒可能有点麻烦,通常不值得.照我的想法。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62056266

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档