这是我的代码:
df = pd.read_csv("/content/Intel_AI4Y/My Drive/Intel_AI4Y_Colab/Module_16/data/Students_Score1.csv")
names = ["Student No." ,"Hours spent studying in a day", "Mathematics score", "English score","Science score"]
df.columns = names
Mathematics_score = df.iloc[:, 0]
df = df[~df.iloc[:, 0].between(100, 0, inclusive=False)]
print(df.describe())
print (df.info())我正在尝试从数学分数中删除错误的数据,数值低于0或高于100。我不确定我该如何编写代码。有人能帮上忙吗?
发布于 2020-07-12 11:15:49
df = df[~df.iloc[:, 0].between(100, 0, inclusive=False)]几乎是correctpandas.Series.between需要一个左右边界,应该是0,>=100.not,所以实际上df.iloc[:, 0].between(0, 100, inclusive=False)返回0到100之间的所有内容,但是~df.iloc[:, 0].between(0, 100, inclusive=False)返回值<=0和Pandas: Selection by position返回值在0到100之间,请使用Pandas: Indexing and selecting data~df.iloc[:, 0].between(0, 100, inclusive=False)Pandas: Selection by position以了解.iloc的正确用法。df.iloc[:, 0]表示您已经选择了所有行、:和索引0处的列。我的样本数据只有一列,所以索引为0。您需要验证感兴趣的列的索引。import pandas as pd
import numpy as np
# sample dataframe
np.random.seed(100)
df = pd.DataFrame({'values': [np.random.randint(-100, 200) for _ in range(500)]})
# values between 0 and 100
df[df.iloc[:, 0].between(0, 100, inclusive=False)]
values
43
37
55
41
35
# values <=0 or >=100
df[~df.iloc[:, 0].between(0, 100, inclusive=False)]
values
-92
180
-21
-47
-34发布于 2020-07-12 08:02:28
因为你的数据帧有标题。我真的建议使用遮罩滤镜,如下所示。
df = df[(df['Mathematics score'] > 0) & (df['Mathematics score'] < 100)]正如@Trenton McKinney所建议的那样,使用iloc有时确实更容易,因为您不必键入列名。
因此,在您的示例中,因为列Mathematics score是第三个列,所以您应该这样做:
df[~df.iloc[:, 2].between(0, 100, inclusive=False)]https://stackoverflow.com/questions/62855766
复制相似问题