首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >将代码块转换为函数会导致错误。

将代码块转换为函数会导致错误。
EN

Stack Overflow用户
提问于 2020-06-18 20:41:28
回答 1查看 44关注 0票数 2

我试图清理代码片段,但是当将代码的一部分迁移到函数中时,它开始推给我一个异常,如下所示:

下面是我想清理的片段:

代码语言:javascript
复制
import pandas as pd
import os

df = pd.read_csv('winequality-red.csv', sep=';')

labels = list(df.columns)
for index, label in enumerate(labels):
    labels[index] = labels[index].replace(' ', '_')


substance = 'pH'
median = df[substance].mean()
for index, substance in enumerate(df[substance]):
    if substance >= median:
        df.loc[index, substance] = 'high'
    else:
        df.loc[index, substance] = 'low'
print(df.groupby(substance).quality.mean())

这样做的目的是创建两个函数,并在需要评估一种物质的时候调用它们,考虑到这一点,我做到了:

代码语言:javascript
复制
def substance_mean(substance):
    return df[substance].mean()

def substance_evaluation(substance):
    for index, substance in enumerate(df[substance]):
        if substance >= substance_mean(substance):
            df.loc[index, substance] = 'high'
        else:
            df.loc[index, substance] = 'low'
    print(df.groupby(substance).quality.mean())

substance_evaluation('pH')

当我运行代码时,会引发以下异常:

代码语言:javascript
复制
Traceback (most recent call last):
  File "/home/atila/Desktop/estudos/udacity/aws_ML/venv-ml/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 3.51

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/atila/Desktop/estudos/udacity/aws_ML/app.py", line 34, in <module>
    substance_evaluation('pH')
  File "/home/atila/Desktop/estudos/udacity/aws_ML/app.py", line 28, in substance_evaluation
    if substance >= substance_mean(substance):
  File "/home/atila/Desktop/estudos/udacity/aws_ML/app.py", line 24, in substance_mean
    return df[substance].mean()
  File "/home/atila/Desktop/estudos/udacity/aws_ML/venv-ml/lib/python3.6/site-packages/pandas/core/frame.py", line 2800, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/atila/Desktop/estudos/udacity/aws_ML/venv-ml/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 3.51
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-06-19 00:26:14

我无法运行它,但是您的所有问题是,在substance_evaluation()中,对于两个应该保持不同值的变量,您使用相同的名称substance

首先您有substance

代码语言:javascript
复制
 def substance_evaluation(substance) 

这个变量应该保留"ph",但是稍后您将使用

代码语言:javascript
复制
 for ..., substance in ...: 

将不同的值赋给此变量(代替"ph"),并在以后使用

代码语言:javascript
复制
 ... >= substance_mean(substance) 

要为mean计算"ph",但此时substance没有值"ph",而是3.51 (显示错误KeyError: 3.51)

你不会有这个问题的,如果你在功能上

代码语言:javascript
复制
 median = df[substance].mean()

代码语言:javascript
复制
 if substance >= median:

此外,使用函数运行单行代码也是浪费时间。

保持这条线,你只计算中值一次前循环。在循环中使用函数,可以多次计算相同的值--这也是浪费时间。

我认为在这两个版本(有功能和没有功能)中,substance仍然会有问题,因为您也在df.loc[index, substance]中使用它,所以它可能尝试执行df.loc[index, 3.51]而不是df.loc[index, "ph"]。你应该用不同的名字.value

代码语言:javascript
复制
for ..., value in ...:
    if value >= median:

您应该有一个这样的函数:

代码语言:javascript
复制
def substance_evaluation(substance):

    median = df[substance].mean()

    for index, value in enumerate(df[substance]):
        if value >= median:
            df.loc[index, substance] = 'high'
        else:
            df.loc[index, substance] = 'low'

    print(df.groupby(substance).quality.mean())

但我觉得你可以写得很简单。

代码语言:javascript
复制
def substance_evaluation(substance):

    median = df[substance].mean()

    mask = (df[substance] >= mediam)

    df[substance][  mask ] = 'high'
    df[substance][ ~mask ] = 'low'

    print(df.groupby(substance).quality.mean())

最终使用np.where()

代码语言:javascript
复制
def substance_evaluation(substance):

    median = df[substance].mean()

    mask = (df[substance] >= mediam)

    df[substance] = np.where(mask, 'high', 'low')

    print(df.groupby(substance).quality.mean())

在此版本中,您可以轻松地创建带有值的新列。

代码语言:javascript
复制
    df["new column"] = np.where(mask, 'high', 'low')

编辑:用于测试的最小工作代码

代码语言:javascript
复制
import pandas as pd
import random
import numpy as np
import time

def version1(df, substance):
    median = df[substance].mean()
    for index, value in enumerate(df[substance]):
        if value >= median:
            df.loc[index, substance] = 'high'
        else:
            df.loc[index, substance] = 'low'

def version2(df, substance):
    median = df[substance].mean()
    mask = (df[substance] >= median)
    df[substance][  mask ] = 'high'
    df[substance][ ~mask ] = 'low'

def version3(df, substance):
    median = df[substance].mean()
    mask = (df[substance] >= median)
    df[substance] = np.where(mask, 'high', 'low')

# ---

random.seed(0) # to generate always the same values

df = pd.DataFrame({'pH': [random.randint(0,7) for _ in range(5)]})

substance = 'pH'

print('--- before ---')
print(df)

# ---

df1 = df.copy()
start = time.time()

version1(df1, substance)

end = time.time()
print('--- after --- time:', end-start)
print(df1)

# ---

df2 = df.copy()
start = time.time()

version2(df2, substance)

end = time.time()
print('--- after --- time:', end-start)
print(df1)

# ---

df3 = df.copy()
start = time.time()

version3(df3, substance)

end = time.time()
print('--- after --- time:', end-start)
print(df1)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62459053

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档