使用elasticsearch搜索将数据编入索引。输入文件多匹配查询是道布和姓。它有和道布一样的学生。所以这个文件也是作为输出来的。有一个想法,即将删除低得分行。我怎么才能接近?
Filename Name DOB Score PageNumber
11086 Ram 11 06 1930 6.4504585 1
11086 Ram 11 06 1930 6.4504585 2
11086 Ram 11 06 1930 6.4504585 1
81564 Kiran 11 06 1930 3.5517883 2
81564 Kiran 11 06 1930 3.5517883 33
81564 Kiran 11 06 1930 3.5517883 12
754133 peter 11 06 1930 2.5905614 1
754133 peter 11 06 1930 2.5905614 1期望输出
Filename Name DOB Score PageNumber
11086 Ram 11 06 1930 6.4504585 1
11086 Ram 11 06 1930 6.4504585 2
11086 Ram 11 06 1930 6.4504585 1发布于 2017-09-13 06:39:57
让我们试试基于.std的过滤。
df = df[~((df.Score - df.Score.max()).abs() > df.Score.std())]
df
Filename Name DOB Score PageNumber
0 11086 Ram 11 06 1930 6.450458 1
1 11086 Ram 11 06 1930 6.450458 2
2 11086 Ram 11 06 1930 6.450458 1Score.std成为数据的动态阈值。
哪里,
((df.Score - df.Score.max()).abs())
0 0.000000
1 0.000000
2 0.000000
3 2.898670
4 2.898670
5 2.898670
6 3.859897
7 3.859897
Name: Score, dtype: float64
df.Score.std()
1.7451830491923459
df.Score.max()
6.4504584999999999发布于 2017-09-13 06:34:37
假设您只想要得分大于3的行
df.query('Score > 3')
Filename Name DOB Score PageNumber
0 11086 Ram 11 06 1930 6.450458 1
1 11086 Ram 11 06 1930 6.450458 2
2 11086 Ram 11 06 1930 6.450458 1
3 81564 Kiran 11 06 1930 3.551788 2
4 81564 Kiran 11 06 1930 3.551788 33
5 81564 Kiran 11 06 1930 3.551788 12假设你想要用标准偏差的倍数进行过滤
df[df.Score > (df.Score.mean() - 1 * df.Score.std())]
Filename Name DOB Score PageNumber
0 11086 Ram 11 06 1930 6.450458 1
1 11086 Ram 11 06 1930 6.450458 2
2 11086 Ram 11 06 1930 6.450458 1
3 81564 Kiran 11 06 1930 3.551788 2
4 81564 Kiran 11 06 1930 3.551788 33
5 81564 Kiran 11 06 1930 3.551788 12或者,您可以只获取与最大值相等的行。
df.query('Score == @df.Score.max()')
Filename Name DOB Score PageNumber
0 11086 Ram 11 06 1930 6.450458 1
1 11086 Ram 11 06 1930 6.450458 2
2 11086 Ram 11 06 1930 6.450458 1或
df[df.Score == df.Score.max()]
Filename Name DOB Score PageNumber
0 11086 Ram 11 06 1930 6.450458 1
1 11086 Ram 11 06 1930 6.450458 2
2 11086 Ram 11 06 1930 6.450458 1https://stackoverflow.com/questions/46190318
复制相似问题