抱歉,如果标题是误导,因为我不知道如何最好地解释我想要做什么。
我利用NBA本赛季的比赛数据,试图为具体的防守阵容找到相对的防守等级。在dataframe (df)中,每个进攻性球员、每个防守球员、财产和得分都有列(还有更多,但这就是我所关心的),所以总共有10列。
如果我过滤一个特定的防御组合,我有一个更小的数据(df2),这只是当防御单位在地板上时的信息。我已经做到了这一点,但我现在想做的是采取所有的进攻球员组合,这一阵容已经面临,并过滤在df的信息。
因此,下面是一个小得多的例子,说明df2可能是什么样子的:
offplayer1 offplayer2 offplayer3 offplayer4 offplayer5 defplayer1 defplayer2 defplayer3 defplayer4 defplayer5 possessions points
0 1 2 3 4 5 11 12 13 14 15 5 5
1 1 2 3 4 6 11 12 13 14 15 4 4
2 2 3 4 5 6 11 12 13 14 15 3 5 从这一点出发,我想在df2中使用所有的filter 1-5组合,并将其用作df中的筛选器。
有什么最好的办法吗?
编辑:下面是生成上述df2的代码和一个示例df (如果您希望它演示的话)
df = pd.DataFrame(np.array([[1,2,3,4,5,11,12,13,14,15,5,5],[1,2,3,4,6,11,12,13,14,15,4,4],[2,3,4,5,6,11,12,13,14,15,3,5],[1,2,3,4,5,11,12,13,14,16,5,5],[1,2,3,4,5,21,22,23,24,25,10,10],[11,12,13,14,15,21,22,23,24,25,5,5]]),columns=['offplayer1','offplayer2','offplayer3','offplayer4','offplayer5','defplayer1','defplayer2','defplayer3','defplayer4','defplayer5','possessions','points'])
df2 = pd.DataFrame(np.array([[1,2,3,4,5,11,12,13,14,15,5,5],[1,2,3,4,6,11,12,13,14,15,4,4],[2,3,4,5,6,11,12,13,14,15,3,5]]),columns=['offplayer1','offplayer2','offplayer3','offplayer4','offplayer5','defplayer1','defplayer2','defplayer3','defplayer4','defplayer5','possessions','points'])发布于 2019-12-06 00:57:20
如果我正确地理解了您的意思,那么您应该能够为每个df创建一个新的索引,这个索引是基于offplayer列然后是set_index,并使用布尔索引和.isin。我稍微修改了你的样本df以显示给你看。
# modified your sample data a little
df = pd.DataFrame(np.array([[1,2,3,4,5,11,12,13,14,15,5,5],
[1,2,3,4,6,11,12,13,14,15,4,4],
[1,2,3,4,5,11,12,13,14,16,3,5],
[2,3,4,5,6,11,12,13,14,15,5,5],
[1,2,3,4,5,11,12,13,14,17,5,5],
[1,2,3,4,7,11,12,13,14,17,5,5]]),
columns=['offplayer1','offplayer2','offplayer3','offplayer4','offplayer5',
'defplayer1','defplayer2','defplayer3','defplayer4','defplayer5',
'possessions','points'])
# def players your are looking for
defplayers = [11,12,13,14,15]
# create df2 through boolean indexing
df2 = df[df[df.columns[5:10]].isin(defplayers).all(1)]
# create new indices
df_idx = df.columns[:5].values.tolist()
df2_idx = df2.columns[:5].values.tolist()
# boolean indexing to filter df
df[df.set_index(df_idx).index.isin(df2.set_index(df2_idx).index)]https://stackoverflow.com/questions/59205015
复制相似问题