我试图用python中的漂亮so解析网站上的数据,最后我从网站中提取数据,所以我想将数据保存在json文件中,但是它按照我编写的代码保存数据如下。
json文件
[
{
"collocation": "\nabove average",
"meaning": "more than average, esp. in amount, age, height, weight etc. "
},
{
"collocation": "\nabsolutely necessary",
"meaning": "totally or completely necessary"
},
{
"collocation": "\nabuse drugs",
"meaning": "to use drugs in a way that's harmful to yourself or others"
},
{
"collocation": "\nabuse of power",
"meaning": "the harmful or unethical use of power"
},
{
"collocation": "\naccept (a) defeat",
"meaning": "to accept the fact that you didn't win a game, match, contest, election, etc."
},我的代码:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import pandas as pd
import json
url = "https://www.englishclub.com/ref/Collocations/"
mylist = [
"A",
"B",
"C",
"D",
"E",
"F",
"G",
"H",
"I",
"J",
"K",
"L",
"M",
"N",
"O",
"P",
"Q",
"R",
"S",
"T",
"U",
"V",
"W"
]
list = []
for i in range(23):
result = requests.get(url+mylist[i]+"/", headers=headers)
doc = BeautifulSoup(result.text, "html.parser")
collocations = doc.find_all(class_="linklisting")
for tag in collocations:
case = {
"collocation": tag.a.string,
"meaning": tag.div.string
}
list.append(case)
with open('data.json', 'w', encoding='utf-8') as f:
json.dump(list, f, ensure_ascii=False, indent=4)但是,例如,我想为每个字母有一个列表,例如,A的一个列表和B的多一个列表,这样我就可以很容易地找到哪个字母开头并使用它。我怎么能这么做。正如您在json文件中所看到的,在配置的开头总是有\,我如何删除它?
发布于 2022-11-10 14:08:16
import requests
from bs4 import BeautifulSoup
import pandas as pd
import json
url = "https://www.englishclub.com/ref/Collocations/"
mylist = [
"A",
"B",
"C",
"D",
"E",
"F",
"G",
"H",
"I",
"J",
"K",
"L",
"M",
"N",
"O",
"P",
"Q",
"R",
"S",
"T",
"U",
"V",
"W"
]
#you can use dictionary instead list. suits your needs better
list = {}
#just for quick testing, i set range to 4
for i in range(4):
list[mylist[i]] = [] #make an empty list for your collocations
result = requests.get(url+mylist[i]+"/")
doc = BeautifulSoup(result.text, "html.parser")
collocations = doc.find_all(class_="linklisting")
for tag in collocations:
case = {
"collocation": tag.a.string.replace("\n",""),#replace \n indentations
"meaning": tag.div.string
}
list[mylist[i]].append(case)#add collocation to related list
with open('data.json', 'w', encoding='utf-8') as f:
json.dump(list, f, ensure_ascii=False, indent=4)我已经为更改的部分写了一篇评论。我们为字典中的每一个字母创建了一个数组。因此,在将来的使用中,您只需使用键就可以获得它们,而不必担心索引。
但是,这是输出
{
"A": [
{
"collocation": "above average",
"meaning": "more than average, esp. in amount, age, height, weight etc. "
},
{
"collocation": "absolutely necessary",
"meaning": "totally or completely necessary"
}
],
"B": [
{
"collocation": "back pay",
"meaning": "money a worker earned in the past but hasn't been paid yet "
},
{
"collocation": "back road",
"meaning": "a small country road "
},
{
"collocation": "back street",
"meaning": "a street in a town or city that's away from major roads or central areas"
}
],
"C": [
{
"collocation": "call a meeting",
"meaning": "to order or invite people to hold a meeting"
},
{
"collocation": "call a name",
"meaning": "to say somebody's name loudly"
},
{
"collocation": "call a strike",
"meaning": "to decide that workers will protest by not going to work "
}
],
"D": [
{
"collocation": "daily life",
"meaning": "life as experienced from day to day"
},
{
"collocation": "dead ahead",
"meaning": "straight ahead"
},
{
"collocation": "dead body",
"meaning": "corpse, or the body of someone who's died"
}
]
}发布于 2022-11-10 14:08:52
在您的循环中,在定义doc之后,尝试以下操作:
for col in doc.select('div.linklisting'):
print(print(col.select_one('h3 a').text.strip(), "--", col.select_one('div.linkdescription').text))例如,对于字母B,它应该输出:
back pay -- money a worker earned in the past but hasn't been paid yet
back road -- a small country road
back street -- a street in a town or city that's away from major roads or central areas等等,您可以将输出元素分配给CSV、dataframe或其他任何东西。
https://stackoverflow.com/questions/74390165
复制相似问题