文章/答案/技术大牛

发布

社区首页 >问答首页 >我如何改进这个Python代码来计算来自Gini杂质的信息增益？

问我如何改进这个Python代码来计算来自Gini杂质的信息增益？
EN

Stack Overflow用户

提问于 2022-07-08 16:30:37

回答 1查看 90关注 0票数 0

下面的代码旨在使用吉尼杂质从数据集中计算信息增益。我认为我编写的代码是功能性的，应该在所有情况下都能成功地执行，但是在Sololearn上有几个隐藏的测试用例是失败的。

我提交的材料如下所示，但这里有一个指向Sololearn：https://code.sololearn.com/cQEDIvXRgL3e的链接。

我的代码有可编辑的输入和详尽的输出，其迂腐版本位于：https://code.sololearn.com/cO755SFZAUJ0。

这段代码中是否有我遗漏的错误或疏忽？它在隐藏的测试用例中失败了，这一定是有问题的，但是我不知道它会是什么。

从我在可见测试用例中可以看到的情况来看，Sololearn正在将1和0的偶数集发送到代码中，这将按照下面的行将其转换为列表。在我的测试版本中，这些行被替换为空列表，在运行它之前，我用1s和0填充这些列表。我试过奇数长度和偶数长度的集合，结果是等长或不等长的，而且它似乎不会对结果产生不利影响。

s = [int(x) for x in input().split()]
a = [int(x) for x in input().split()]
b = [int(x) for x in input().split()]

#Function to get counts for set and splits, to be used in later formulae.
def setCount(n):
    return len(n)

Cs = setCount(s)
Ca = setCount(a)
Cb = setCount(b)

#Function to get sums of "True" values in each, for later formulae.
def tSum(x):
    sum = 0
    for n in x:
        if n == 1:
            sum += 1
    return sum

Ss = tSum(s)
Sa = tSum(a)
Sb = tSum(b)

#Function to get percentage of "True" values in each, for later formulae.
def getp(x, n):
    p = x/n
    return p

Ps = (getp(Ss, Cs))
Pa = (getp(Sa, Ca))
Pb = (getp(Sb, Cb))

#Function to get Gini impurity for each, to be used in final formula.
def gimp(p):
    return 2 * p * (1-p)

Hs = (gimp(Ps))
Ha = (gimp(Pa))
Hb = (gimp(Pb))

#Final formula, intended to output information gain to five decimal places.
infoGain = round((Hs - (Sa/Ss) * Ha - (Sb/Ss) * Hb),5)

print(infoGain)

machine-learning

gini

python

回答 1

Stack Overflow用户

发布于 2022-07-08 21:14:06

这个问题是由那里的导师蒂博尔·圣塔在索洛尔学上回答的。他们解决测试用例的代码远远超出了问题的范围。它贴在下面，可以在Sololearn：https://code.sololearn.com/cUoaMq6bzxP8/上找到。

它的长短在于，由于结果将被四舍五入为5个小数，所以编写代码的不同方法很可能会导致结果中的差异。虽然我的代码并没有“错”，但它并不是获取隐藏测试用例背后的确切值的正确方法。这也是不必要的冗长。

解决测试用例的代码：

def gini(p):
    return 2 * p * (1-p)

def p(data):
    return sum(data) / len(data)

giniS = gini(p(S))
deltaA = gini(p(A)) * len(A) / len(S)
deltaB = gini(p(B)) * len(B) / len(S)
gain = giniS - deltaA - deltaB

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/72914385

复制

相似问题

问我如何改进这个Python代码来计算来自Gini杂质的信息增益？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问我如何改进这个Python代码来计算来自Gini杂质的信息增益？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问我如何改进这个Python代码来计算来自Gini杂质的信息增益？
EN