Python 使用reduce计算节点的基尼指数_Python_Functools

Python 使用reduce计算节点的基尼指数

python

Python 使用reduce计算节点的基尼指数,python,functools,Python,Functools,我试图应用以下公式：我不清楚这为什么不起作用：定义基尼节点（节点）：计数=总和（节点）基尼=functools.reduce（λp，c:p+（1-（c/计数）**2），节点）打印（计数，基尼）打印（1-（节点[0]/计数）**2，1-（节点[1]/计数）**2）回归基尼评估gini（[[175330]，[220120]]）prints: 505 175.57298304087834 0.8799137339476522 0.5729830408783452 340 220.

我试图应用以下公式：

我不清楚这为什么不起作用：

定义基尼节点（节点）：计数=总和（节点）基尼=functools.reduce（λp，c:p+（1-（c/计数）**2），节点）打印（计数，基尼）打印（1-（节点[0]/计数）**2，1-（节点[1]/计数）**2）回归基尼评估

gini（[[175330]，[220120]]）

prints:

505 175.57298304087834
0.8799137339476522 0.5729830408783452
340 220.87543252595157
0.5813148788927336 0.8754325259515571

注意，第二个print语句打印我想要求和的数字，给出示例输入。返回值（第一个print语句的第二个值）应该是一个介于0和1之间的数字

我的头发怎么了

我试图编写的完整功能是：

导入工具
def gini_节点（节点）：
计数=总和（节点）
基尼=functools.reduce（λp，c:p+（1-（c/计数）**2），节点）
打印（计数，基尼）
打印（1-（节点[0]/计数）**2，1-（节点[1]/计数）**2）
回归基尼
迪夫基尼（集团）：
计数=[组中节点的总和（节点）]
计数=总和（计数）
比例=[n/计数中n的计数]
返回和（[gini_节点（节点）*节点比例，zip比例（组，比例）]）
#试验
印刷品（基尼（[[175330]，[220120]]））

reduce的工作方式是从其容器中获取2个参数（仅2个）

并执行给定给它的操作，然后继续使用2个参数在列表上迭代相同的操作。

gini = functools.reduce(lambda p,c: p + (1 - (c/count)**2), node)

对于第一个节点

（175330）

这个lambda将在

中取

，在

中取

，并返回您

175.5729830087834

，而不是我们想要的

gini = functools.reduce(lambda p,c: (1 - (p/count)**2) + (1 - (c/count)**2), node)

我添加了一些打印语句，让我们看看它们的输出

import functools

def gini_node(node):
    count = sum(node)
    gini = functools.reduce(lambda p,c: (1 - (p/count)**2) + (1 - (c/count)**2), node)
    print(count, gini)
    print(1 - (node[0]/count)**2, 1 - (node[1]/count)**2)
    return gini

def gini (groups):
    counts = [ sum(node) for node in groups ]
    count = sum(counts)
    proportions = [ n/count for n in counts ]
    print(count, counts, proportions) #This
    gini_indexes = [ gini_node(node) * proportion for node, proportion in zip(groups, proportions)]
    print(gini_indexes) #And this
    return sum(gini_indexes)

# test
print(gini([[175, 330], [220, 120]]))

rahul@RNA-HP:~$ python3 so.py
845 [505, 340] [0.5976331360946746, 0.40236686390532544]
505 1.4528967748259973 #Second number here is addition of 2 numbers below
0.8799137339476522 0.5729830408783452
340 1.4567474048442905 #Same for this
0.5813148788927336 0.8754325259515571
#The first number of this list is first 1.45289677.... * 0.597633...
#Basically the addition and then multiplication by it's proportion.
[0.868299255961099, 0.5861468847894187]
#What you are returning to final print statement is the addition of gini co-effs of each node i.e the sum of the list above
1.4544461407505178

如果有两个以上的参数（*），则使用更简单的方法

与上面定义的

reduce（）

函数的工作原理相同。

Rahul您真是太好了，但恐怕您错了。基尼系数总是介于0和1之间。有“国家a或阵列b”等群体，以及每个群体中的班级（“短发人群”、“特征b”等）。每组的基尼指数总和必须为1，并且必须乘以总和之前的比例，以便总值也在0到1之间。零表示绝对相等。我在这里优雅地描述了我正在计算的内容：在这里完整地描述了（但我不理解R中的表格）：一种不同的、更直观的计算方法在这里：但现在我应该实现这个公式。然而，从您的上一个代码块中得到我需要的东西并不需要太长时间！谢谢你

gini=1-和（[（p/count）**2表示节点中的p]）

那么为什么这里的和大于1，是因为负相关吗？

 gini = sum([(1 - (p/count)**2) for p in node])