Python 连接2个数据帧并创建父子关系？_Python_Pandas

Python 连接2个数据帧并创建父子关系？

python pandas

Python 连接2个数据帧并创建父子关系？,python,pandas,Python,Pandas,我有两个数据帧父数据帧和子数据帧，我希望以groupby方式连接这两个数据帧父母 parent parent_value 0 Super Sun 0 1 Alpha Mars 4 2 Pluto 9 df_儿童 child value 0 Planet Sun 100 1 on

我有两个数据帧父数据帧和子数据帧，我希望以groupby方式连接这两个数据帧

父母

           parent  parent_value
    0   Super Sun             0
    1  Alpha Mars             4
    2       Pluto             9

df_儿童

                   child  value
    0         Planet Sun    100
    1  one Sun direction    101
    2     Ice Pluto Tune    101
    3       Life on Mars     99
    4         Mars Robot    105
    5          Sun Twins    200

我希望输出的顺序

order=['Sun'，'Pluto'，'Mars']

Sun
-childs
Pluto
-childs
Mards
-childs

我想找到关键字为wise的子项，请参考

parent\u dict

parent_dict = {'Super Sun': 'Sun',
           'Alpha Mars': 'Mars',
           'Pluto': 'Pluto'}

预期产量

    child         value
0   Super Sun             0 # parent
1   Planet Sun          100 # child  
2   one Sun direction   101 # child   
3   Sun Twins           200 # child  
4   Pluto                 9 # parent
5   Ice Pluto Tune      101 # child       
6   Alpha Mars            4 # parent
7   Life on Mars         99 # child    
8   Mars Robot          105 # child

到目前为止，我已经尝试迭代主列表和两个dfs，但预期的输出并没有出现，下面是我的代码

output_df = pd.DataFrame()
for o in order:
    key = o
    for j, row in df_parent.iterrows():
        if key in row[0]:
            output_df.at[j, 'parent'] = key
            output_df.at[j, 'value'] = row[1]
            for k, row1 in df_child.iterrows():
                if key in row1[0]:
                    output_df.at[j, 'parent'] = key
                    output_df.at[j, 'value'] = row[1]              

print(output_df)

输出：

  parent  value
0    Sun    0.0
2  Pluto    9.0
1   Mars    4.0

              parent  value
0          Super Sun    0.0
1         Planet Sun  100.0
2  one Sun direction  101.0
3          Sun Twins  200.0
4              Pluto    9.0
5     Ice Pluto Tune  101.0
6         Alpha Mars    4.0
7       Life on Mars   99.0
8         Mars Robot  105.0

这里有一个解决方案，通过迭代两个数据帧，但这似乎是一个非常长的过程

output_df = pd.DataFrame()
c = 0
for o in order:
    key = o
    for j, row in df_parent.iterrows():
        if key in row[0]:
            output_df.at[c, 'parent'] = row[0]
            output_df.at[c, 'value'] = row[1]
            c += 1
            for k, row1 in df_child.iterrows():
                if key in row1[0]:
                    output_df.at[c, 'parent'] = row1[0]
                    output_df.at[c, 'value'] = row1[1]              
                    c += 1

输出：

  parent  value
0    Sun    0.0
2  Pluto    9.0
1   Mars    4.0

              parent  value
0          Super Sun    0.0
1         Planet Sun  100.0
2  one Sun direction  101.0
3          Sun Twins  200.0
4              Pluto    9.0
5     Ice Pluto Tune  101.0
6         Alpha Mars    4.0
7       Life on Mars   99.0
8         Mars Robot  105.0

经过一些准备之后，您可以对这两个数据帧使用

append

。首先在

df_parent

和

df_child

中创建一个列关键字，用于以后的排序。为此，您可以使用以下方法：

举例说明

df\u父项：
       parent  parent_value keyword
0   Super Sun             0     Sun
1  Alpha Mars             4    Mars
2       Pluto             9   Pluto

现在，您可以使用append
，还可以根据列表order
对数据帧进行排序。重命名
用于满足您的预期输出，并用于追加
按需工作（两个数据框中的列应具有相同的名称）
然后输出为：
               child  value
0          Super Sun      0
1         Planet Sun    100
2  one Sun direction    101
3          Sun Twins    200
4              Pluto      9
5     Ice Pluto Tune    101
6         Alpha Mars      4
7       Life on Mars     99
8         Mars Robot    105

注意：在按“关键字”排序后，父项在childs之前的事实是df_child
与df_parent
，而不是相反。
考虑通过关键字find连接数据帧和顺序：
好的，我认为问题在于这里output\u df。在[j，'parent']=key
处，我用j变量覆盖它，而不是递增索引，我可能需要维护一个单独的counter@jezrael帮我一下，有没有更好的方法
order = ['Sun', 'Pluto', 'Mars']

def find_keyword(str_param):    
    output = None
    # ITERATE THROUGH LIST AND RETURN MATCHING POSITION
    for i,v in enumerate(order):
        if v in str_param:
            output = i

    return output

# RENAME COLS AND CONCAT DFs
df_combined = pd.concat([df_parent.rename(columns={'parent':'item', 'parent_value':'value'}),
                         df_child.rename(columns={'child':'item'})],
                        ignore_index=True)

# CREATE KEYWORD COL WITH DEFINED FUNCTION
df_combined['keyword'] = df_combined['item'].apply(find_keyword)

# SORT BY KEYWORD AND DROP HELPER COL
df_combined = df_combined.sort_values(['keyword', 'value'])\
                         .drop(columns=['keyword']).reset_index(drop=True)

print(df_combined)
#                 item  value
# 0          Super Sun      0
# 1         Planet Sun    100
# 2  one Sun direction    101
# 3          Sun Twins    200
# 4              Pluto      9
# 5     Ice Pluto Tune    101
# 6         Alpha Mars      4
# 7       Life on Mars     99
# 8         Mars Robot    105