我有一个有脚本的页面,它包含一个我需要的数组(myHashDay)。
<script type="text/javascript">
function toggleCheckBoxes(obj) {
var theForm = document.getElementById("thePage:SiteTemplate:theForm");
for (var i=0; i<theForm.elements.length; i++) {
if (theForm.elements[i].type == "checkbox" &&
theForm.elements[i].name != obj.name) {
theForm.elements[i].checked = false;
}
}
}
// ATLAS-1089: back & continue buttons showing twice for
// Reserved Group/Emergency Appointments
function checkIfButtonsShowTwice() {
// From first form
var continueBtn = document.getElementById("thePage:SiteTemplate:theForm:continueBtn");
var backBtn = document.getElementById("thePage:SiteTemplate:theForm:backBtn");
// From second form
var continueBtnToHide = document.getElementById("thePage:SiteTemplate:theForm2:continueBtn");
var backBtnToHide = document.getElementById("thePage:SiteTemplate:theForm2:form2BackBtn");
// The controller logic for rendering the buttons
// is fragile so... front end solutions for the win
if(continueBtn != null) {
if (continueBtnToHide != null) {
continueBtnToHide.style.display = "none";
}
}
}
var myDayHash = new Array();
myDayHash['14-9-2023'] = true;
myDayHash['4-12-2023'] = true;
myDayHash['31-1-2024'] = true;
myDayHash['1-2-2024'] = true;
myDayHash['27-2-2024'] = true;
myDayHash['28-2-2024'] = true;
myDayHash['4-3-2024'] = true;
myDayHash['5-3-2024'] = true;
myDayHash['6-3-2024'] = true;
myDayHash['7-3-2024'] = true;
myDayHash['11-3-2024'] = true;
myDayHash['12-3-2024'] = true;
myDayHash['13-3-2024'] = true;
myDayHash['14-3-2024'] = true;
myDayHash['18-3-2024'] = true;
myDayHash['19-3-2024'] = true;
myDayHash['20-3-2024'] = true;
myDayHash['21-3-2024'] = true;
myDayHash['25-3-2024'] = true;
myDayHash['26-3-2024'] = true;
myDayHash['27-3-2024'] = true;
var ofcAptDateStr = null;ofcAptDateStr = '';
var splitDate = 'Thu Sep 14 00:00:00 GMT 2023'.split(" ");
var minApptDate = splitDate[1] + ' ' + splitDate[2] + ' ' + splitDate[5];
}
</script>所以我需要从它得到myDayHash数组。
我想做的是:
driver.get('\test.html')
element = driver.execute_script("myDayHash")但它什么都不回。
我也尝试了element =driver.execute_script(“返回myDayHash")。但它没有回报。
但是如果我在Chrome浏览器中使用控制台并输入"myDayHash“,它就会打印我的整个数组。
如何将这个数组传递给Python?
发布于 2022-08-20 14:05:12
取数据
from bs4 import BeautifulSoup
import requests
import re
r = requests.get('http://website.com/test.html')
soup = BeautifulSoup(r.content)
array = soup.select('script')从每个脚本标记中获取文本
text = ' '.join([elem.text for elem in array])应用正则表达式获取myDayHash
下面的regex以元组列表的形式给出myDayHash数据结构值。
myDayHash = re.findall(r"myDayHash\[\'(.*?)\'\] = (.*?);", text)产生产出:
print(dict(myDayHash))输出
这给了我们预期的产出。现在,根据您的需求,您可以将键:value对存储到任何数据结构中。
{
'14-9-2023': 'true',
'4-12-2023': 'true',
'31-1-2024': 'true',
'1-2-2024': 'true',
'27-2-2024': 'true',
'28-2-2024': 'true',
'4-3-2024': 'true',
'5-3-2024': 'true',
'6-3-2024': 'true',
'7-3-2024': 'true',
'11-3-2024': 'true',
'12-3-2024': 'true',
'13-3-2024': 'true',
'14-3-2024': 'true',
'18-3-2024': 'true',
'19-3-2024': 'true',
'20-3-2024': 'true',
'21-3-2024': 'true',
'25-3-2024': 'true',
'26-3-2024': 'true',
'27-3-2024': 'true'
}TLDR
from bs4 import BeautifulSoup
import requests
import re
r = requests.get('http://website.com/test.html')
soup = BeautifulSoup(r.content)
array = soup.select('script')
text = ' '.join([elem.text for elem in array])
myDayHash = re.findall(r"myDayHash\[\'(.*?)\'\] = (.*?);", text)
print(dict(myDayHash))发布于 2022-08-20 10:24:07
该变量是在函数的作用域(而不是全局的)中定义的。换句话说,你不能。
https://stackoverflow.com/questions/73424989
复制相似问题