文章/答案/技术大牛

发布

社区首页 >问答首页 >刮高图表数据-在图表中的所有数据点都有问题

问刮高图表数据-在图表中的所有数据点都有问题
EN

Stack Overflow用户

提问于 2022-07-29 10:17:15

回答 1查看 71关注 0票数 0

我正在尝试从High表中从以下URL中抓取数据：

https://www.pricecharting.com/game/pal-nes/legend-of-zelda

下面的代码将从标签为(松散)的默认图表数据中获取数据，但我需要从图表上列出的其他项目中提取数据。(CIB，新的，分级，装箱，手册)。我真的被困住了，不知道如何从其他图表中获取信息。

下面是我的代码，它只适用于“松散”(默认选项)。

import time

import pandas as pd
from selenium.webdriver.chrome.service import Service
from selenium import webdriver

service = Service(executable_path="/driver_selenium/geckodriver.exe")
driver = webdriver.Firefox(service=service)

website = "https://www.pricecharting.com/game/pal-nes/legend-of-zelda#completed-auctions-graded"


driver.get(website)
time.sleep(5)

temp = driver.execute_script('return window.Highcharts.charts[0]'
                             '.series[0].options.data')

data = [item[1] for item in temp]
print(data)

如果有更好的方法来提取不使用Selenium的数据，那么就可以得到积分，类似于这里的答案：Scrape highchart into python

python

web-scraping

highcharts

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-07-29 10:32:27

编辑:我们可以尝试以下内容：

import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
import json

url = 'https://www.pricecharting.com/game/pal-nes/legend-of-zelda'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
data_script = soup.find('script', string=re.compile("VGPC.chart_data = {"))
# print(data_script.text.split('VGPC.chart_data = ')[1].split(' VGPC.product = {')[0].split(';')[0].strip())
data = json.loads(data_script.text.split('VGPC.chart_data = ')[1].split(' VGPC.product = {')[0].split(';')[0].strip())
df = pd.DataFrame(data)
print(df)

boxonly cib graded  manualonly  new used
0   [1498888800000, 0]  [1498888800000, 17004]  [1498888800000, 0]  [1498888800000, 0]  [1498888800000, 0]  [1498888800000, 6510]
1   [1501567200000, 0]  [1501567200000, 7878]   [1501567200000, 0]  [1501567200000, 0]  [1501567200000, 23100]  [1501567200000, 5409]
2   [1504245600000, 0]  [1504245600000, 8879]   [1504245600000, 0]  [1504245600000, 0]  [1504245600000, 23100]  [1504245600000, 5169]
3   [1506837600000, 0]  [1506837600000, 8432]   [1506837600000, 0]  [1506837600000, 0]  [1506837600000, 37665]  [1506837600000, 4499]
4   [1509516000000, 0]  [1509516000000, 9286]   [1509516000000, 0]  [1509516000000, 0]  [1509516000000, 37665]  [1509516000000, 4513]
... ... ... ... ... ... ...

此数据格式包含6列历史价格，在该图表中使用(价格以美分表示)。这种方法避免了硒/显色剂的开销。

我们现在可以查看单独的线条图，例如在“new”中：

df_new = pd.DataFrame(data['new'], columns = ['Date_time', 'Price'])
df_new['Date_time'] = pd.to_datetime(df_new['Date_time'], unit="ms")
print(df_new)

结果是：

Date_time   Price
0   2017-07-01 06:00:00 0
1   2017-08-01 06:00:00 23100
2   2017-09-01 06:00:00 23100
3   2017-10-01 06:00:00 37665
4   2017-11-01 06:00:00 37665
... ... ...

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/73165122

复制

相似问题

问刮高图表数据-在图表中的所有数据点都有问题
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问刮高图表数据-在图表中的所有数据点都有问题EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问刮高图表数据-在图表中的所有数据点都有问题
EN