文章/答案/技术大牛

发布

社区首页 >问答首页 >将代码块转换为函数会导致错误。

问将代码块转换为函数会导致错误。
EN

Stack Overflow用户

提问于 2020-06-18 20:41:28

回答 1查看 44关注 0票数 2

我试图清理代码片段，但是当将代码的一部分迁移到函数中时，它开始推给我一个异常，如下所示：

下面是我想清理的片段：

import pandas as pd
import os

df = pd.read_csv('winequality-red.csv', sep=';')

labels = list(df.columns)
for index, label in enumerate(labels):
    labels[index] = labels[index].replace(' ', '_')


substance = 'pH'
median = df[substance].mean()
for index, substance in enumerate(df[substance]):
    if substance >= median:
        df.loc[index, substance] = 'high'
    else:
        df.loc[index, substance] = 'low'
print(df.groupby(substance).quality.mean())

这样做的目的是创建两个函数，并在需要评估一种物质的时候调用它们，考虑到这一点，我做到了：

def substance_mean(substance):
    return df[substance].mean()

def substance_evaluation(substance):
    for index, substance in enumerate(df[substance]):
        if substance >= substance_mean(substance):
            df.loc[index, substance] = 'high'
        else:
            df.loc[index, substance] = 'low'
    print(df.groupby(substance).quality.mean())

substance_evaluation('pH')

当我运行代码时，会引发以下异常：

Traceback (most recent call last):
  File "/home/atila/Desktop/estudos/udacity/aws_ML/venv-ml/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 3.51

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/atila/Desktop/estudos/udacity/aws_ML/app.py", line 34, in <module>
    substance_evaluation('pH')
  File "/home/atila/Desktop/estudos/udacity/aws_ML/app.py", line 28, in substance_evaluation
    if substance >= substance_mean(substance):
  File "/home/atila/Desktop/estudos/udacity/aws_ML/app.py", line 24, in substance_mean
    return df[substance].mean()
  File "/home/atila/Desktop/estudos/udacity/aws_ML/venv-ml/lib/python3.6/site-packages/pandas/core/frame.py", line 2800, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/atila/Desktop/estudos/udacity/aws_ML/venv-ml/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 3.51

python

pandas

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-06-19 00:26:14

我无法运行它，但是您的所有问题是，在substance_evaluation()中，对于两个应该保持不同值的变量，您使用相同的名称substance。

首先您有substance在

 def substance_evaluation(substance)

这个变量应该保留"ph"，但是稍后您将使用

 for ..., substance in ...:

将不同的值赋给此变量(代替"ph")，并在以后使用

 ... >= substance_mean(substance)

要为mean计算"ph"，但此时substance没有值"ph"，而是3.51 (显示错误KeyError: 3.51)

你不会有这个问题的，如果你在功能上

 median = df[substance].mean()

和

 if substance >= median:

此外，使用函数运行单行代码也是浪费时间。

保持这条线，你只计算中值一次前循环。在循环中使用函数，可以多次计算相同的值--这也是浪费时间。

我认为在这两个版本(有功能和没有功能)中，substance仍然会有问题，因为您也在df.loc[index, substance]中使用它，所以它可能尝试执行df.loc[index, 3.51]而不是df.loc[index, "ph"]。你应该用不同的名字.value

for ..., value in ...:
    if value >= median:

您应该有一个这样的函数：

def substance_evaluation(substance):

    median = df[substance].mean()

    for index, value in enumerate(df[substance]):
        if value >= median:
            df.loc[index, substance] = 'high'
        else:
            df.loc[index, substance] = 'low'

    print(df.groupby(substance).quality.mean())

但我觉得你可以写得很简单。

def substance_evaluation(substance):

    median = df[substance].mean()

    mask = (df[substance] >= mediam)

    df[substance][  mask ] = 'high'
    df[substance][ ~mask ] = 'low'

    print(df.groupby(substance).quality.mean())

最终使用np.where()

def substance_evaluation(substance):

    median = df[substance].mean()

    mask = (df[substance] >= mediam)

    df[substance] = np.where(mask, 'high', 'low')

    print(df.groupby(substance).quality.mean())

在此版本中，您可以轻松地创建带有值的新列。

    df["new column"] = np.where(mask, 'high', 'low')

编辑:用于测试的最小工作代码

import pandas as pd
import random
import numpy as np
import time

def version1(df, substance):
    median = df[substance].mean()
    for index, value in enumerate(df[substance]):
        if value >= median:
            df.loc[index, substance] = 'high'
        else:
            df.loc[index, substance] = 'low'

def version2(df, substance):
    median = df[substance].mean()
    mask = (df[substance] >= median)
    df[substance][  mask ] = 'high'
    df[substance][ ~mask ] = 'low'

def version3(df, substance):
    median = df[substance].mean()
    mask = (df[substance] >= median)
    df[substance] = np.where(mask, 'high', 'low')

# ---

random.seed(0) # to generate always the same values

df = pd.DataFrame({'pH': [random.randint(0,7) for _ in range(5)]})

substance = 'pH'

print('--- before ---')
print(df)

# ---

df1 = df.copy()
start = time.time()

version1(df1, substance)

end = time.time()
print('--- after --- time:', end-start)
print(df1)

# ---

df2 = df.copy()
start = time.time()

version2(df2, substance)

end = time.time()
print('--- after --- time:', end-start)
print(df1)

# ---

df3 = df.copy()
start = time.time()

version3(df3, substance)

end = time.time()
print('--- after --- time:', end-start)
print(df1)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/62459053

复制

相似问题

问将代码块转换为函数会导致错误。
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将代码块转换为函数会导致错误。EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将代码块转换为函数会导致错误。
EN