要使用pp-0、pp-1、pp-2、pp-3、pp-4等提取标记并保存值“AAAAA”.及其对应值1000、1002、1003、1004等.在python字典格式中,dE29应该是e 110第一级,而“AAAAA”E 211应该是E 112第二级从d键开始,加上“AAAAA@X##”>E 213加上以E 114字符串代码>E215代码格式保存所有字典数据。
编码
from bs4 import BeautifulSoup
html = '''<span id="AAAAA" style="display:none">
<span id="pp-0" style="display:none">1000</span>
<span id="pp-1" style="display:none">1001</span>
<span id="pp-2" style="display:none">1002</span>
<span id="pp-3" style="display:none">1003</span>
<span id="pp-4" style="display:none">1004</span>
<span id="pp-5" style="display:none">1005</span>
<span id="pp-6" style="display:none">1006</span>
<span id="pp-7" style="display:none">1007</span>
<span id="pp-8" style="display:none">1008</span>
<span id="pp-9" style="display:none">1009</span>
<span id="pp-10" style="display:none">1010</span>
<span id="pp-11" style="display:none">1011</span>
<span id="pp-12" style="display:none">1012</span>
<span id="pp-13" style="display:none">1013</span>
<span id="pp-14" style="display:none">1014</span>
<span id="pp-17" style="display:none">1015</span>
<span id="pp-27" style="display:none">1016</span>
</span>'''
soup = BeautifulSoup(html, 'html.parser')
elements = soup.find_all('span')错误输出
[<span id="AAAAA" style="display:none">
<span id="pp-0" style="display:none">1000</span>
<span id="pp-1" style="display:none">1001</span>
<span id="pp-2" style="display:none">1002</span>
<span id="pp-3" style="display:none">1003</span>
<span id="pp-4" style="display:none">1004</span>
<span id="pp-5" style="display:none">1005</span>
<span id="pp-6" style="display:none">1006</span>
<span id="pp-7" style="display:none">1007</span>
<span id="pp-8" style="display:none">1008</span>
<span id="pp-9" style="display:none">1009</span>
<span id="pp-10" style="display:none">1010</span>
<span id="pp-11" style="display:none">1011</span>
<span id="pp-12" style="display:none">1012</span>
<span id="pp-13" style="display:none">1013</span>
<span id="pp-14" style="display:none">1014</span>
<span id="pp-17" style="display:none">1015</span>
<span id="pp-27" style="display:none">1016</span>.....]预期输出(在编码级别)
{'d': 'AAAAA@X##{"pp-0": 1000, "pp-1":1001, "pp-2":1002, "pp-3": 1003, "pp-4": 1004, "pp-5": 1005, "pp-6": 1006, "pp-7": 1007, "pp-8": 1008, "pp-9": 1009, "pp-10": 1010, "pp-11": 1011, "pp-12": 1012, "pp-13": 1013, "pp-14": 1014, "pp-17": 1015, "pp-27": 1016}'}预期输出
{'d':'AAAAA@X##{"pp-0": 1000, "pp-1":1001, "pp-2":1002, "pp-3": 1003,
"pp-4": 1004, "pp-5": 1005, "pp-6": 1006, "pp-7": 1007, "pp-8": 1008,
"pp-9": 1009, "pp-10": 1010, "pp-11": 1011, "pp-12": 1012, "pp-13": 1013,
"pp-14": 1014, "pp-17": 1015, "pp-27": 1016}'}发布于 2021-07-02 04:55:39
我创建了单独的字典,用于根据html添加数据和查找数据-文本和id。
soup = BeautifulSoup(html, 'html.parser')
span=soup.find_all("span")
x={}
other_dict={}
x['d']=span[0].get("id")
for i in span[1:]:
other_dict[i.get("id")]=i.get_text()在得到两个字典后,我们可以使用json模块将other_dict转换成字符串,并且可以实现数据的最终输出!
import json
data=json.dumps(other_dict)
final=x['d']+data
x['d']=final
print(x)https://stackoverflow.com/questions/68219642
复制相似问题