我有一个这样的数据格式,其中有不同的计划id,每个计划id也有不同的route_ids。
plan_id route_id dtn
801 12289 2629.0
801 12289 1666.0
801 12289 0.0
801 12289 2216.0
801 7734 2219.0
801 7734 853.0
653 8819 3375.0
653 8819 2184.0
.
.
.
.dtn在几秒钟内。dtn是路由中下一次交付的距离,即在索引3,即传递索引3和传递索引4之间的距离。*
我需要为每个给定的route_id找到dtn的中位数,并将其作为一列附加到与相应的pid和route_id相匹配的现有数据中。我该怎么做?
发布于 2020-07-31 09:47:56
每个route_id的中位数:
df.groupby('route_id')[['dtn']].median()每个plan_id的中位数:
df.groupby('plan_id')[['dtn']].median()发布于 2020-07-31 10:10:54
import pandas as pd
df = pd.read_csv('data.csv') # Load dummy data (from original question example)
# Get median vals grouped by relevant cols
r_med = df.groupby('route_id')[['dtn']].median()
p_med = df.groupby('plan_id')[['dtn']].median()
# Append both relevant median vals as cols to each row
for i, row in df.iterrows():
df.loc[i, 'median_route_dtn'] = r_med.loc[row['route_id'], 'dtn']
df.loc[i, 'median_plan_dtn'] = p_med.loc[row['plan_id'], 'dtn']给出以下df
plan_id route_id dtn median_route_dtn median_plan_dtn
0 801 12289 2629.0 1941.0 1941.0
1 801 12289 1666.0 1941.0 1941.0
2 801 12289 0.0 1941.0 1941.0
3 801 12289 2216.0 1941.0 1941.0
4 801 7734 2219.0 1536.0 1941.0
5 801 7734 853.0 1536.0 1941.0
6 653 8819 3375.0 2779.5 2779.5
7 653 8819 2184.0 2779.5 2779.5https://stackoverflow.com/questions/63189322
复制相似问题