我有一个类别树表示如下。
import pandas as pd
asset_tree = [
{'id': 1, 'name': 'Linear Asset', 'parent_id': -1},
{'id': 2, 'name': 'Lateral', 'parent_id': 1},
{'id': 3, 'name': 'Main', 'parent_id': 1},
{'id': 4, 'name': 'Point Asset', 'parent_id': -1},
{'id': 5, 'name': 'Fountain', 'parent_id': 4},
{'id': 6, 'name': 'Hydrant', 'parent_id': 4}
]
tree = pd.DataFrame(asset_tree)
print(tree)这给我提供了如下数据:
id name parent_id
0 1 Linear Asset -1
1 2 Lateral 1
2 3 Main 1
3 4 Point Asset -1
4 5 Fountain 4
5 6 Hydrant 4树中的最高节点具有parent_id等于-1,因此树可以用图形表示如下:
Linear Asset
| - Lateral
| - Main
Point Asset
| - Fountain
| - Hydrant我需要生成以下数据。
id name parent_id flat_name
0 1 Linear Asset -1 Linear Asset
1 2 Lateral 1 Linear Asset : Lateral
2 3 Main 1 Linear Asset : Main
3 4 Point Asset -1 Point Asset
4 5 Fountain 4 Point Asset : Fountain
5 6 Hydrant 4 Point Asset : Hydrant树是动态生成的,可以有任意数量的级别,因此下面的树
asset_tree = [
{'id': 1, 'name': 'Linear Asset', 'parent_id': -1},
{'id': 2, 'name': 'Lateral', 'parent_id': 1},
{'id': 3, 'name': 'Main', 'parent_id': 1},
{'id': 4, 'name': 'Point Asset', 'parent_id': -1},
{'id': 5, 'name': 'Fountain', 'parent_id': 4},
{'id': 6, 'name': 'Hydrant', 'parent_id': 4},
{'id': 7, 'name': 'Steel', 'parent_id': 2},
{'id': 8, 'name': 'Plastic', 'parent_id': 2},
{'id': 9, 'name': 'Steel', 'parent_id': 3},
{'id': 10, 'name': 'Plastic', 'parent_id': 3}
]应产生以下结果:
id name parent_id flat_name
0 1 Linear Asset -1 Linear Asset
1 2 Lateral 1 Linear Asset : Lateral
2 3 Main 1 Linear Asset : Main
3 4 Point Asset -1 Point Asset
4 5 Fountain 4 Point Asset : Fountain
5 6 Hydrant 4 Point Asset : Hydrant
6 7 Steel 2 Linear Asset : Lateral : Steel
7 8 Plastic 2 Linear Asset : Lateral : Plastic
8 9 Steel 3 Linear Asset : Main : Steel
9 10 Plastic 3 Linear Asset : Main : Plastic发布于 2021-03-24 15:35:30
这里有一个递归的apply函数来实现这一点。函数接受一个id并通过树返回它的“路径”:
def flatname(ID):
row = df[df['id'] == ID].squeeze()
if row['parent_id'] == -1:
return row['name']
else:
return flatname(row['parent_id']) + ' : ' + row['name']要使用,请呼叫:
df['flat_name'] = df['id'].apply(flatname)在第二个示例中使用后的df:
id name parent_id flat_name
0 1 Linear Asset -1 Linear Asset
1 2 Lateral 1 Linear Asset : Lateral
2 3 Main 1 Linear Asset : Main
3 4 Point Asset -1 Point Asset
4 5 Fountain 4 Point Asset : Fountain
5 6 Hydrant 4 Point Asset : Hydrant
6 7 Steel 2 Linear Asset : Lateral : Steel
7 8 Plastic 2 Linear Asset : Lateral : Plastic
8 9 Steel 3 Linear Asset : Main : Steel
9 10 Plastic 3 Linear Asset : Main : PlasticOP注意到,上面的函数显式地引用在函数范围之外定义的df变量。因此,如果您将您的DataFrame称为不同的东西,或者您想在许多DataFrames上调用它,这可能会导致问题。一个解决办法是将apply函数转换为更多的私有助手,并创建一个外部(更方便用户使用)函数来调用它:
def _flatname_recurse(ID, df):
row = df[df['id'] == ID].squeeze()
if row['parent_id'] == -1:
return row['name']
else:
return _flatname_recurse(row['parent_id'], df=df) + ' : ' + row['name']
# asset_df to specify we are looking for a specific kind of df
def flatnames(asset_df):
return asset_df['id'].apply(_flatname_recurse, df=asset_df)然后打电话给:
df['flat_name'] = flatnames(df)另外,请注意,我以前使用row = df.iloc[ID - 1, :]来标识行,在这种情况下,行可以工作,但依赖于id大于索引。This approach更通用。
发布于 2021-03-24 15:31:48
可以使用递归查找父id的路径:
import pandas as pd
asset_tree = [{'id': 1, 'name': 'Linear Asset', 'parent_id': -1}, {'id': 2, 'name': 'Lateral', 'parent_id': 1}, {'id': 3, 'name': 'Main', 'parent_id': 1}, {'id': 4, 'name': 'Point Asset', 'parent_id': -1}, {'id': 5, 'name': 'Fountain', 'parent_id': 4}, {'id': 6, 'name': 'Hydrant', 'parent_id': 4}]
a_tree = {i['id']:i for i in asset_tree} #to dictionary for more efficient lookup
def get_parent(d, c = []):
if (k:=a_tree.get(d['parent_id'])) is None:
return c + [d['name']]
return get_parent(k, c+[d['name']])
r = [{**i, 'flat_name':' : '.join(get_parent(i)[::-1])} for i in asset_tree]
df = pd.DataFrame(r)输出:
id name parent_id flat_name
0 1 Linear Asset -1 Linear Asset
1 2 Lateral 1 Linear Asset : Lateral
2 3 Main 1 Linear Asset : Main
3 4 Point Asset -1 Point Asset
4 5 Fountain 4 Point Asset : Fountain
5 6 Hydrant 4 Point Asset : Hydrant在你更大的asset_tree上
asset_tree = [{'id': 1, 'name': 'Linear Asset', 'parent_id': -1}, {'id': 2, 'name': 'Lateral', 'parent_id': 1}, {'id': 3, 'name': 'Main', 'parent_id': 1}, {'id': 4, 'name': 'Point Asset', 'parent_id': -1}, {'id': 5, 'name': 'Fountain', 'parent_id': 4}, {'id': 6, 'name': 'Hydrant', 'parent_id': 4}, {'id': 7, 'name': 'Steel', 'parent_id': 2}, {'id': 8, 'name': 'Plastic', 'parent_id': 2}, {'id': 9, 'name': 'Steel', 'parent_id': 3}, {'id': 10, 'name': 'Plastic', 'parent_id': 3}]
a_tree = {i['id']:i for i in asset_tree}
r = [{**i, 'flat_name':' : '.join(get_parent(i)[::-1])} for i in asset_tree]
df = pd.DataFrame(r)输出:
id name parent_id flat_name
0 1 Linear Asset -1 Linear Asset
1 2 Lateral 1 Linear Asset : Lateral
2 3 Main 1 Linear Asset : Main
3 4 Point Asset -1 Point Asset
4 5 Fountain 4 Point Asset : Fountain
5 6 Hydrant 4 Point Asset : Hydrant
6 7 Steel 2 Linear Asset : Lateral : Steel
7 8 Plastic 2 Linear Asset : Lateral : Plastic
8 9 Steel 3 Linear Asset : Main : Steel
9 10 Plastic 3 Linear Asset : Main : Plastic发布于 2021-03-24 15:32:19
这是一个网络问题,试试networkx
import networkx as nx
# build the graph
G = nx.from_pandas_edgelist(tree, source='parent_id', target='id',
create_using=nx.DiGraph)
# map id to name
node_names = tree.set_index('id')['name'].to_dict()
# get path from root (-1) to the node
def get_path(node):
# this is a tree, so exactly one simple path for each node
for path in nx.simple_paths.all_simple_paths(G, -1, node):
return ' : '.join(node_names.get(i) for i in path[1:])
tree['flat_name'] = tree['id'].apply(get_path)输出:
id name parent_id flat_name
0 1 Linear Asset -1 Linear Asset
1 2 Lateral 1 Linear Asset : Lateral
2 3 Main 1 Linear Asset : Main
3 4 Point Asset -1 Point Asset
4 5 Fountain 4 Point Asset : Fountain
5 6 Hydrant 4 Point Asset : Hydrant
6 7 Steel 2 Linear Asset : Lateral : Steel
7 8 Plastic 2 Linear Asset : Lateral : Plastic
8 9 Steel 3 Linear Asset : Main : Steel
9 10 Plastic 3 Linear Asset : Main : Plastichttps://stackoverflow.com/questions/66784106
复制相似问题