我正在尝试从High表中从以下URL中抓取数据:
https://www.pricecharting.com/game/pal-nes/legend-of-zelda
下面的代码将从标签为(松散)的默认图表数据中获取数据,但我需要从图表上列出的其他项目中提取数据。(CIB,新的,分级,装箱,手册)。我真的被困住了,不知道如何从其他图表中获取信息。
下面是我的代码,它只适用于“松散”(默认选项)。
import time
import pandas as pd
from selenium.webdriver.chrome.service import Service
from selenium import webdriver
service = Service(executable_path="/driver_selenium/geckodriver.exe")
driver = webdriver.Firefox(service=service)
website = "https://www.pricecharting.com/game/pal-nes/legend-of-zelda#completed-auctions-graded"
driver.get(website)
time.sleep(5)
temp = driver.execute_script('return window.Highcharts.charts[0]'
'.series[0].options.data')
data = [item[1] for item in temp]
print(data)如果有更好的方法来提取不使用Selenium的数据,那么就可以得到积分,类似于这里的答案:Scrape highchart into python
发布于 2022-07-29 10:32:27
编辑:我们可以尝试以下内容:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
import json
url = 'https://www.pricecharting.com/game/pal-nes/legend-of-zelda'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
data_script = soup.find('script', string=re.compile("VGPC.chart_data = {"))
# print(data_script.text.split('VGPC.chart_data = ')[1].split(' VGPC.product = {')[0].split(';')[0].strip())
data = json.loads(data_script.text.split('VGPC.chart_data = ')[1].split(' VGPC.product = {')[0].split(';')[0].strip())
df = pd.DataFrame(data)
print(df)返回:
boxonly cib graded manualonly new used
0 [1498888800000, 0] [1498888800000, 17004] [1498888800000, 0] [1498888800000, 0] [1498888800000, 0] [1498888800000, 6510]
1 [1501567200000, 0] [1501567200000, 7878] [1501567200000, 0] [1501567200000, 0] [1501567200000, 23100] [1501567200000, 5409]
2 [1504245600000, 0] [1504245600000, 8879] [1504245600000, 0] [1504245600000, 0] [1504245600000, 23100] [1504245600000, 5169]
3 [1506837600000, 0] [1506837600000, 8432] [1506837600000, 0] [1506837600000, 0] [1506837600000, 37665] [1506837600000, 4499]
4 [1509516000000, 0] [1509516000000, 9286] [1509516000000, 0] [1509516000000, 0] [1509516000000, 37665] [1509516000000, 4513]
... ... ... ... ... ... ...此数据格式包含6列历史价格,在该图表中使用(价格以美分表示)。这种方法避免了硒/显色剂的开销。
我们现在可以查看单独的线条图,例如在“new”中:
df_new = pd.DataFrame(data['new'], columns = ['Date_time', 'Price'])
df_new['Date_time'] = pd.to_datetime(df_new['Date_time'], unit="ms")
print(df_new)结果是:
Date_time Price
0 2017-07-01 06:00:00 0
1 2017-08-01 06:00:00 23100
2 2017-09-01 06:00:00 23100
3 2017-10-01 06:00:00 37665
4 2017-11-01 06:00:00 37665
... ... ...https://stackoverflow.com/questions/73165122
复制相似问题