我正在尝试转换整数列表中的字符串列表,将其ids关联到dataframe列中。
那是因为我需要像接下来的节目一样按id绘制一个运动列表。有些运动不在JSON中。在这种情况下,有必要用整数list列删除所需的dataframe中的元素。
这就是我必须映射的JSON:
[
{
"id": 1,
"name": "Karate",
},
{
"id": 2,
"name": "Paintball",
},
{
"id": 3,
"name": "Rugby",
},
{
"id": 4,
"name": "Squash",
},
{
"id": 5,
"name": "Softball",
},
{
"id": 6,
"name": "Swimiming",
},
{
"id": 7,
"name": "Weighlifting",
},
{
"id": 8,
"name": "Table Tennis",
},
{
"id": 9,
"name": "Tenpin Bowling",
}
]这就是我所拥有的数据,在JSON中没有的体育项目。
id sports
111 ['Softball', 'Table Tennis', 'Rafting']
222 ['Rugby', 'Tenpin Bowling','Squash']
333 ['Weighlifting', 'Tennis', 'Swimiming']
444 ['Softball', 'Table Tennis', 'Paintball']
555 ['Rugby', 'Tenpin Bowling','Squash']
666 ['Weighlifting', 'Karate', 'Swimiming']
777 ['Softball', 'Table Tennis', 'Soccer']
888 ['Basketball', 'Tenpin Bowling','Squash']
999 ['Weighlifting', 'Karate', 'Swimiming']这就是我所需要的数据,没有不能在JSON中映射的运动。
id sports
111 [5, 8]
222 [3, 9, 4]
333 [7, 6]
444 [5, 8, 2]
555 [3, 9, 4]
666 [7, 1, 6]
777 [5, 8]
888 [9, 4]
999 [7, 1, 6]有解决办法吗?
提前谢谢。
发布于 2020-06-30 17:32:04
如果带有运动代码的dicts列表在一个文件中,则将其加载到using.
data替换为您所在的变量的名称这个答案假设df.sports = df.sports.apply(literal_eval)列中的值是列表,而不是字符串
sports列内容是字符串,则使用如果要用代码替换
df['sports'] =而不是df['codes'] = from ast import literal_eval
import pandas as pd
# if the list of dicts is in a file, load it with the following
with open('test.json', 'r') as f:
data = literal_eval(f.read())
# data is the object now holding the list of dicts
# convert data to a dict
dd = {d['name']: d['id'] for d in data}
# add a codes column for the sports in dd
df['codes'] = df.sports.apply(lambda x: [dd.get(v) for v in x if v in dd])
# display df
id sports codes
0 111 [Softball, Table Tennis, Rafting] [5, 8]
1 222 [Rugby, Tenpin Bowling, Squash] [3, 9, 4]
2 333 [Weighlifting, Tennis, Swimiming] [7, 6]
3 444 [Softball, Table Tennis, Paintball] [5, 8, 2]
4 555 [Rugby, Tenpin Bowling, Squash] [3, 9, 4]
5 666 [Weighlifting, Karate, Swimiming] [7, 1, 6]
6 777 [Softball, Table Tennis, Soccer] [5, 8]
7 888 [Basketball, Tenpin Bowling, Squash] [9, 4]
8 999 [Weighlifting, Karate, Swimiming] [7, 1, 6]发布于 2020-06-30 16:42:08
首先从json data创建一个mappings字典,方法是从json data初始化数据,并使用DataFrame.set_index和Series.to_dict,然后使用这个mappings字典将列表中的每一项运动映射到相应的id。
mappings = pd.read_json(data).set_index('name')['id'].to_dict()
df['sports'] = [[mappings[key] for key in lst if key in mappings] for lst in df['sports']]或者,也可以将Series.explode与Series.map一起使用,但是这种方法通常要慢一些。
mappings = pd.read_json(data).set_index('name')['id']
df['sports'] = (
df['sports'].explode()
.map(mappings).dropna().astype(int).groupby(level=0).agg(list)
)结果:
# print(df)
id sports
0 111 [5, 8]
1 222 [3, 9, 4]
2 333 [7, 6]
3 444 [5, 8, 2]
4 555 [3, 9, 4]
5 666 [7, 1, 6]
6 777 [5, 8]
7 888 [9, 4]
8 999 [7, 1, 6]发布于 2020-06-30 16:59:02
这和你以前的问题很相似。我修改了我以前的答案,以处理这个案例、NaN和非列表元素。让我们将json字符串称为l_str
df_map = pd.read_json(l_str)
d = dict(zip(df_map.name, df_map.id))
df['sports'] = [[d.get(y) for y in x if y in d]
for x in df.sports if isinstance(x, list)]
Out[51]:
id sports
0 111 [5, 8]
1 222 [3, 9, 4]
2 333 [7, 6]
3 444 [5, 8, 2]
4 555 [3, 9, 4]
5 666 [7, 1, 6]
6 777 [5, 8]
7 888 [9, 4]
8 999 [7, 1, 6]https://stackoverflow.com/questions/62661992
复制相似问题