Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/288.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何根据pandas中的依赖值更新数据帧?_Python_Python 3.x_Pandas_Dataframe_Networkx - Fatal编程技术网

Python 如何根据pandas中的依赖值更新数据帧?

Python 如何根据pandas中的依赖值更新数据帧?,python,python-3.x,pandas,dataframe,networkx,Python,Python 3.x,Pandas,Dataframe,Networkx,我必须根据依赖项值更新数据帧。如何做到这一点 例如,输入数据帧df: id dependency 10 20 30 30 40 40 50 10 60 20 我们有: 20->30和30->40。因此,最终结果将是20->40和30->40 同样地,60->20->30->40所以最终结果将是60->40 最终结果: id dependency final_dependency 10 20 30

我必须根据依赖项值更新数据帧。如何做到这一点

例如,输入数据帧
df

id      dependency
10
20       30
30       40
40
50       10
60       20     
我们有:
20->30
30->40
。因此,最终结果将是
20->40
30->40

同样地,
60->20->30->40
所以最终结果将是
60->40

最终结果:

id      dependency   final_dependency
10
20       30            40
30       40            40
40
50       10            10
60       20            40

一种方法是创建自定义函数:

s = df[df["dependency"].notnull()].set_index("id")["dependency"].to_dict()

def func(val):
    if not s.get(val):
        return None
    while s.get(val):
        val = s.get(val)
    return val

df["final"] = df["id"].apply(func)

print (df)

   id  dependency  final
0  10         NaN    NaN
1  20        30.0   40.0
2  30        40.0   40.0
3  40         NaN    NaN
4  50        10.0   10.0
5  60        20.0   40.0
你可以用它来做这件事。首先,使用具有依赖关系的节点创建一个图:

df_edges = df.dropna(subset=['dependency'])
G = nx.from_pandas_edgelist(df_edges, create_using=nx.DiGraph, source='dependency', target='id')
现在,我们可以找到每个节点的根祖先,并将其添加为新列:

def find_root(G, node):
    ancestors = list(nx.ancestors(G, node))
    if len(ancestors) > 0:
        root = find_root(G, ancestors[0])
    else:
        root = node
    return root

df['final_dependency'] = df['id'].apply(lambda x: find_root(G, x))
df['final_dependency'] = np.where(df['final_dependency'] == df['id'], np.nan, df['final_dependency'])
结果数据帧:

   id  dependency  final_dependency
0  10         NaN               NaN
1  20        30.0              40.0
2  30        40.0              40.0
3  40         NaN               NaN
4  50        10.0              10.0
5  60        20.0              40.0

你已经有了一些答案。ItErrorws()是一个有点昂贵的解决方案,但它也希望您拥有它

import pandas as pd

raw_data = {'id': [i for i in range (10,61,10)],
            'dep':[None,30,40,None,10,20]}
df = pd.DataFrame(raw_data)

df['final_dep'] = df.dep

for i,r in df.iterrows():

    if pd.notnull(r.dep):
        x = df.loc[df['id'] == r.dep, 'dep'].values[0]
        if pd.notnull(x):
            df.iloc[i,df.columns.get_loc('final_dep')] = x
        else:
            df.iloc[i,df.columns.get_loc('final_dep')] = r.dep

print (df)
其结果将是:

   id   dep final_dep
0  10   NaN       NaN
1  20  30.0        40
2  30  40.0        40
3  40   NaN       NaN
4  50  10.0        10
5  60  20.0        30

非常糟糕的问题表述。我们有。。。将是,我们有。。。将。