首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何从python中的javascript变量中提取JSON/Table

如何从python中的javascript变量中提取JSON/Table
EN

Stack Overflow用户
提问于 2020-06-24 13:36:00
回答 2查看 377关注 0票数 0

我一直在尝试 https://support.riverbed.com/content/support/eos_eoa.html,它包含一个在javascript中生成的分页表。

我目前正在使用漂亮的汤来捕获脚本元素。然而,变量EOL_ENTRIES保存了我需要的数据,但是我无法解析它。关于如何成功地数据的任何技巧。

最终的目标是将这些数据实际放入PowerBI中,但是PBI只能在第一页上使用刮取

任何帮助都是非常感谢的。

代码语言:javascript
复制
import re
from bs4 import BeautifulSoup

http = httplib2.Http()
url='https://support.riverbed.com/content/support/eos_eoa.html'
resp, data = http.request(url)
html = data.decode("UTF-8")
soup = BeautifulSoup(html, 'html5lib')

#the 37th script which contains the data
script_with_data = soup.find_all('script')[37]

示例输出如下所示

代码语言:javascript
复制
<script type="text/javascript">

        var EOL_ENTRIES = [
            
            
            
            
                
                    
                    
                        
                    
                    
                
                {
                    productFamily: 'SteelCentral',
                    shortName: 'SteelCentral AppInternals Collector v9',
                    link: 'https:\/\/support.riverbed.com\/content\/support\/eos_eoa\/steelcentral-cascade-opnet\/SteelCentral-AppInternals-Console-v9-and-AppInternals-Collector-v9-BrowserMetrix-OnPremise.html',
                    linkOverride: 'https:\/\/support.riverbed.com\/content\/support\/eos_eoa\/steelcentral-cascade-opnet\/SteelCentral-AppInternals-Console-v9-and-AppInternals-Collector-v9-BrowserMetrix-OnPremise.html',
                    sku: 'AIXCOL',
                    skuOverride: '',
                    description: 'SteelCentral AppInternals Collector v9',
                    limitedAvailability: '',
                    limitedAvailabilityFormatted: '',
                    endOfAvailability: 'Wed Jul 03 00:00:00 PDT 2019',
                    endOfAvailabilityFormatted: 'Wed Jul 03 00:00:00 PDT 2019',
                    endOfSupportFeatures: 'Sat Aug 31 00:00:00 PDT 2019',
                    endOfSupportFeaturesFormatted: 'Sat Aug 31 00:00:00 PDT 2019',
                    endOfSupportMaintenance: 'Sat Aug 31 00:00:00 PDT 2019',
                    endOfSupportMaintenanceFormatted: 'Sat Aug 31 00:00:00 PDT 2019'}
];
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-06-24 15:46:43

数据在JavaScript中,因此需要进行一些预处理才能用json模块加载数据:

代码语言:javascript
复制
import re
import json
import requests
from bs4 import BeautifulSoup


url = 'https://support.riverbed.com/content/support/eos_eoa.html'
soup = BeautifulSoup(requests.get(url).content, 'html5lib')

# select <script> tag of interest
s = soup.find(lambda t: t.name == 'script' and 'var EOL_ENTRIES' in t.text)

# extract string from this script tag
t = re.search(r'var EOL_ENTRIES = (\[.*\]);', s.text, flags=re.S)[1]

# preprocess the string
t = t.replace("'", '"')
t = re.sub(r'^(\s*)(.*?):', r'\1"\2":', t, flags=re.M)

# decode string to Python data
data = json.loads(t)

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

# print some data to screen:
for product in data:
    print('{:<40} {:<40} {}'.format(product['sku'], product['productFamily'], product['shortName']))

指纹:

代码语言:javascript
复制
AIXCOL                                   SteelCentral                             SteelCentral AppInternals Collector v9
AIXCOL-AN                                SteelCentral                             SteelCentral AppInternals Collector v9
AIXCOL-LP                                SteelCentral                             SteelCentral AppInternals Collector v9
AIXCOL-LP-MODEL                          SteelCentral                             SteelCentral AppInternals Collector v9
AIXCOL-LS                                SteelCentral                             SteelCentral AppInternals Collector v9
AIXCOL-LS-MODEL                          SteelCentral                             SteelCentral AppInternals Collector v9
AIXCOL-SITE                              SteelCentral                             SteelCentral AppInternals Collector v9
AIXCOL-SUB-LIC                           SteelCentral                             SteelCentral AppInternals Collector v9
PANCOL                                   SteelCentral                             SteelCentral AppInternals Collector v9


... and so on.
票数 3
EN

Stack Overflow用户

发布于 2020-06-24 16:12:13

如果您试图将数据导入Power,我将采取不同的方法:提取数据并将其加载到dataframe中,然后将其导出到PBI可以读取的Excel/CSV中。

所以我想试试这个:

代码语言:javascript
复制
import pandas as pd    
prods = str(script_with_data).split('var EOL_ENTRIES = [')[1].split('}')[0].replace('\t','').replace('\n','').split(',')
    rows = []
    for prod in prods:
        row = []
        row.extend([prod.split(': ')[0],prod.split(': ')[1].replace("'","")])
        rows.append(row)
    
    pd.DataFrame(rows)

从这里开始,使用标准熊猫方法导出数据。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62556397

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档