我怎么能用Python + Beautiful从这样的js块中刮掉lat和lng呢?
Gmaps.map.markers = [{"id":6,"multi_system":"No","connectedProjects":null,"description":"Kaheawa风力项目- Younicos“ 国家:美国 技术类型:电化学 状态:运行状态:“letter&chld=%E2%80%A2|FE7569”,“宽度”:32,“高度”:32,"lat":20.7983626,"lng":-156.3319253}];
基本代码(参见我的最后一个问题):
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.energystorageexchange.org/projects/6")
soup = BeautifulSoup(page.content, 'lxml')
coord = soup.findAll("Gmaps.map.markers")谢谢你的回答。
发布于 2018-04-21 20:10:29
你可以用这个方法。从该变量中获取dict数据,并使用json模块解析它:
import json
JS_BLOCK = """Gmaps.map.markers = [{"id":6,"multi_system":"No","connectedProjects":null,"description":"Kaheawa Wind Project - Younicos
Country : United States
Technology Type : Electro-chemical
Status : Operational","picture":"http://chart.apis.google.com/chart?chst=d_map_pin_letter&chld=%E2%80%A2|FE7569","width":32,"height":32,"lat":20.7983626,"lng":-156.3319253}];"""
ini = JS_BLOCK.find("Gmaps.map.markers = [") + len("Gmaps.map.markers = [")
end = JS_BLOCK.find("}];") + 1
data = json.loads(JS_BLOCK[ini:end].replace('\n', ''))
print(data['lat'])
print(data['lng'])输出
20.7983626
-156.3319253如果您愿意,也可以尝试使用regex方法来完成这一任务。
发布于 2018-04-22 17:17:51
Regex解决方案:
代码:
import json
import re
import requests
url = 'https://www.energystorageexchange.org/projects/6'
r = requests.get(url)
html = r.text
markers_raw = re.search(
r'Gmaps\.map\.markers'
r'\s*=\s*'
r'(.*);', html).group(1)
markers = json.loads(markers_raw)
import pprint
pprint.pprint(markers)输出:
[{'connectedProjects': None,
'description': "<a href='/projects/6'>Kaheawa Wind Project - "
'Younicos</a><br>Country : United States<br>Technology Type : '
'Electro-chemical<br>Status : Operational',
'height': 32,
'id': 6,
'lat': 20.7983626,
'lng': -156.3319253,
'multi_system': 'No',
'picture': 'http://chart.apis.google.com/chart?chst=d_map_pin_letter&chld=%E2%80%A2|FE7569',
'width': 32}]如果正则表达式对您来说是新的,您可以查看这里。
https://stackoverflow.com/questions/49957543
复制相似问题