Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/284.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 是否可能使用递归lambda函数实现数据帧“应用”?_Python_Pandas_Dataframe_Recursion_Apply - Fatal编程技术网

Python 是否可能使用递归lambda函数实现数据帧“应用”?

Python 是否可能使用递归lambda函数实现数据帧“应用”?,python,pandas,dataframe,recursion,apply,Python,Pandas,Dataframe,Recursion,Apply,我有一个表示递归父子关系的数据帧。这种情况下的数据称为“因子族” 每个因子族包含多个因子,这些因子经过加权,每个族的权重总和为100% 因子本身可以是因子族 递归的深度没有限制 e、 g 我已经用熊猫的以下数据帧表示了这一点 python df = pd.DataFrame({ "code": ["a", "b", "c", "d", "e", "f", &

我有一个表示递归父子关系的数据帧。这种情况下的数据称为“因子族”

每个因子族包含多个因子,这些因子经过加权,每个族的权重总和为100%

因子本身可以是因子族

递归的深度没有限制

e、 g

我已经用熊猫的以下数据帧表示了这一点

python
df = pd.DataFrame({
"code": ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k"],
"weight": [0.1, 0.4, 0.5, 0.2, 0.3, 0.5, 0.1, 0.2, 0.7, 0.6, 0.4],
"parent_code":["", "", "", "a", "a", "a", "b", "b", "b", "h", "h"]
})
df.set_index("code", inplace=True)
df
输出:

|code|weight|parent_code|
|----|------|-----------| 
|a   |0.1   |           |
|b   |0.4   |           |
|c   |0.5   |           |
|d   |0.2   |a          |
|e   |0.3   |a          |
|f   |0.5   |a          |
|g   |0.1   |b          |
|h   |0.2   |b          |
|i   |0.7   |b          |
|j   |0.6   |h          |
|k   |0.4   |h          |
|----|------|-----------| 
然后我添加了一个计算列,它是一个因子的权重乘以其父权重。我称之为终端重量

因此,终端节点的终端权重之和(在本例中为c、d、e、f、g、k、l、i)为100%

python

def parent_weight(code, family_factors):
    if code in family_factors.index:
        return family_factors["weight"][code] * parent_weight(family_factors["parent_code"][code], family_factors)
    else:
        return 1



df["terminal_weight"] = df.apply(lambda x: parent_weight(x.name, df), axis=1)

df
输出

|code|weight|parent_code|terminal_weight|
|----|------|-----------| --------------|
|a   |0.1   |           |0.100          |
|b   |0.4   |           |0.400          |
|c   |0.5   |           |0.500          |
|d   |0.2   |a          |0.020          |
|e   |0.3   |a          |0.030          |
|f   |0.5   |a          |0.050          |
|g   |0.1   |b          |0.040          |
|h   |0.2   |b          |0.080          |
|i   |0.7   |b          |0.280          |
|j   |0.6   |h          |0.048          |
|k   |0.4   |h          |0.032          |
|----|------|-----------| --------------|
所以我的问题是:有没有更聪明的方法来实现这一点,这样我就不必定义
parent\u weight
函数了?我是否可以将其放入传递给
DataFrame.apply()
的lambda函数中


提前感谢

我会这样做,在dataframe的子集上循环,并使用临时列存储链接的权重和当前测试的父项。注意,我用np.nan值替换了df中的空白字符串

import pandas as pd
import numpy as np

df = pd.DataFrame({
"code": ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k"],
"weight": [0.1, 0.4, 0.5, 0.2, 0.3, 0.5, 0.1, 0.2, 0.7, 0.6, 0.4],
"parent_code":[np.nan, np.nan, np.nan, "a", "a", "a", "b", "b", "b", "h", "h"]
})


df['temp'] = df['parent_code']
df['terminal_weight'] = df['weight']


while True:
    
    parents = df[df.temp.notnull()][['temp']].drop_duplicates(keep='first').copy()
    if len(parents)==0:
        break
    
    parents = df[['code', 'terminal_weight', 'parent_code']].merge(
            parents.rename({"temp":"code"}, axis=1),
            on="code",
            how="inner"
            )
    parents.rename(
            {'terminal_weight':'weight_parent', 'code':'parent_code_temp', 'parent_code':'temp'}, 
            axis=1, 
            inplace=True
            )
    df = df.rename({'temp':'parent_code_temp'}, axis=1).merge(
            parents, 
            on='parent_code_temp', 
            how='left'
            )
    df.drop('parent_code_temp', axis=1, inplace=True)
    df["weight_parent"]= df["weight_parent"].fillna(1)
    df['terminal_weight'] = df['terminal_weight'] * df["weight_parent"]
    df.drop(['weight_parent'], axis=1, inplace=True)

    
df.drop('temp', axis=1, inplace=True)
print(df)

我会这样做,在dataframe的子集上循环,并使用临时列存储链接的权重和当前测试的父项。注意,我用np.nan值替换了df中的空白字符串

import pandas as pd
import numpy as np

df = pd.DataFrame({
"code": ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k"],
"weight": [0.1, 0.4, 0.5, 0.2, 0.3, 0.5, 0.1, 0.2, 0.7, 0.6, 0.4],
"parent_code":[np.nan, np.nan, np.nan, "a", "a", "a", "b", "b", "b", "h", "h"]
})


df['temp'] = df['parent_code']
df['terminal_weight'] = df['weight']


while True:
    
    parents = df[df.temp.notnull()][['temp']].drop_duplicates(keep='first').copy()
    if len(parents)==0:
        break
    
    parents = df[['code', 'terminal_weight', 'parent_code']].merge(
            parents.rename({"temp":"code"}, axis=1),
            on="code",
            how="inner"
            )
    parents.rename(
            {'terminal_weight':'weight_parent', 'code':'parent_code_temp', 'parent_code':'temp'}, 
            axis=1, 
            inplace=True
            )
    df = df.rename({'temp':'parent_code_temp'}, axis=1).merge(
            parents, 
            on='parent_code_temp', 
            how='left'
            )
    df.drop('parent_code_temp', axis=1, inplace=True)
    df["weight_parent"]= df["weight_parent"].fillna(1)
    df['terminal_weight'] = df['terminal_weight'] * df["weight_parent"]
    df.drop(['weight_parent'], axis=1, inplace=True)

    
df.drop('temp', axis=1, inplace=True)
print(df)