Python 使用过滤视图插入原始数据框

Python 使用过滤视图插入原始数据框,python,pandas,Python,Pandas,我使用稀疏数据帧,如下所示: df = pd.DataFrame.from_dict({'type': {581: 'A', 1638: 'B', 706: 'C', 422: 'B', 487: 'A', 1503: 'D', 1948: 'B', 700: 'E', 2040: 'D', 1664: 'C'}, 'set1_a': {581: 27.08, 1638: np.nan, 706: 92.37, 422: np.nan, 487: np.nan, 1503: np.nan, 19

我使用稀疏数据帧,如下所示:

df = pd.DataFrame.from_dict({'type': {581: 'A', 1638: 'B', 706: 'C', 422: 'B', 487: 'A', 1503: 'D', 1948: 'B', 700: 'E', 2040: 'D', 1664: 'C'}, 'set1_a': {581: 27.08, 1638: np.nan, 706: 92.37, 422: np.nan, 487: np.nan, 1503: np.nan, 1948: np.nan, 700: np.nan, 2040: np.nan, 1664: np.nan}, 'set1_b': {581: 68.0, 1638: np.nan, 706: 0.0, 422: np.nan, 487: np.nan, 1503: np.nan, 1948: np.nan, 700: np.nan, 2040: np.nan, 1664: np.nan}, 'set2_a': {581: np.nan, 1638: np.nan, 706: np.nan, 422: np.nan, 487: np.nan, 1503: np.nan, 1948: np.nan, 700: 21.99, 2040: np.nan, 1664: np.nan}, 'set2_b': {581: np.nan, 1638: np.nan, 706: np.nan, 422: np.nan, 487: np.nan, 1503: np.nan, 1948: np.nan, 700: 92.91, 2040: np.nan, 1664: np.nan}, 'set3_a': {581: 28.56, 1638: 21.79, 706: 95.15, 422: 45.1, 487: 65.33, 1503: 85.6, 1948: 51.5, 700: 98.14, 2040: 40.37, 1664: 66.18}, 'set3_b': {581: 68.0, 1638: 59.3, 706: 0.0, 422: 51.42, 487: 59.07, 1503: 57.1, 1948: 34.6, 700: 6.02, 2040: 8.25, 1664: 58.47}})

     type  set1_a  set1_b  set2_a  set2_b  set3_a  set3_b
581     A   27.08    68.0     NaN     NaN   28.56   68.00
1638    B     NaN     NaN     NaN     NaN   21.79   59.30
706     C   92.37     0.0     NaN     NaN   95.15    0.00
422     B     NaN     NaN     NaN     NaN   45.10   51.42
487     A     NaN     NaN     NaN     NaN   65.33   59.07
1503    D     NaN     NaN     NaN     NaN   85.60   57.10
1948    B     NaN     NaN     NaN     NaN   51.50   34.60
700     E     NaN     NaN   21.99   92.91   98.14    6.02
2040    D     NaN     NaN     NaN     NaN   40.37    8.25
1664    C     NaN     NaN     NaN     NaN   66.18   58.47
我的目标是根据应用于
类型
的一些规则填写
set1_a
set1_b
列。每种类型都可以分配给某些组,如下所示:

type_group1 = ['A', 'C', 'B', 'D']
type_group2 = ['E', 'F', 'G']
type_group1_df = df[df['type'].isin(type_group1)]
type_group1_df.loc[type_group1_df['set1_a'].isnull(), 'set1_a'] = type_group1_df['set3_a']
type_group1_df.loc[type_group1_df['set1_b'].isnull(), 'set1_b'] = type_group1_df['set3_b']

type_group2_df = df[df['type'].isin(type_group2)]
type_group2_df[['set1_a', 'set1_b']] = type_group2_df[['set2_a', 'set2_b']]
规则如下:

  • 如果
    type
    type\u组1
    中,则如果
    set1\u a
    set1\u b
    已经有值,则保持原样,否则将
    set3\u a
    set3\u b
    分配给它们
  • 如果
    type
    位于
    type_组2
    中,则分别将
    set2_a
    set2_b
    分配给
    set1_a
    set2_b
  • 真正的类型和类型组要复杂得多,因此为了代码简洁,我希望创建Pandas视图并使用它们进行分配,如下所示:

    type_group1 = ['A', 'C', 'B', 'D']
    type_group2 = ['E', 'F', 'G']
    
    type_group1_df = df[df['type'].isin(type_group1)]
    type_group1_df.loc[type_group1_df['set1_a'].isnull(), 'set1_a'] = type_group1_df['set3_a']
    type_group1_df.loc[type_group1_df['set1_b'].isnull(), 'set1_b'] = type_group1_df['set3_b']
    
    type_group2_df = df[df['type'].isin(type_group2)]
    type_group2_df[['set1_a', 'set1_b']] = type_group2_df[['set2_a', 'set2_b']]
    
    但是,两者都返回一个新的数据帧,而不是插入到原始的
    df
    。因此,我相信他们是在内部创建df的副本,而不是视图。如何创建熊猫视图以插入原始
    df

    预期产出将是:

         type  set1_a  set1_b  set2_a  set2_b  set3_a  set3_b
    581     A   27.08   68.00     NaN     NaN   28.56   68.00
    1638    B   21.79   59.30     NaN     NaN   21.79   59.30
    706     C   92.37    0.00     NaN     NaN   95.15    0.00
    422     B   45.10   51.42     NaN     NaN   45.10   51.42
    487     A   65.33   59.07     NaN     NaN   65.33   59.07
    1503    D   85.60   57.10     NaN     NaN   85.60   57.10
    1948    B   51.50   34.60     NaN     NaN   51.50   34.60
    700     E   21.99   92.91   21.99   92.91   98.14    6.02
    2040    D   40.37    8.25     NaN     NaN   40.37    8.25
    1664    C   66.18   58.47     NaN     NaN   66.18   58.47
    

    您可以使用
    isin
    设置条件,然后使用
    np。选择
    分配列:

    cond1 = (df["type"].isin(type_group1))&(df["set1_a"].isnull())&(df["set1_b"].isnull())
    cond2 = df["type"].isin(type_group2)
    
    df["set1_a"] = np.select([cond1, cond2],[df["set3_a"],df["set2_a"]],default=df["set1_a"])
    df["set1_b"] = np.select([cond1, cond2],[df["set3_b"],df["set2_b"]],default=df["set1_b"])
    
    print (df)
    
         type  set1_a  set1_b  set2_a  set2_b  set3_a  set3_b
    581     A   27.08   68.00     NaN     NaN   28.56   68.00
    1638    B   21.79   59.30     NaN     NaN   21.79   59.30
    706     C   92.37    0.00     NaN     NaN   95.15    0.00
    422     B   45.10   51.42     NaN     NaN   45.10   51.42
    487     A   65.33   59.07     NaN     NaN   65.33   59.07
    1503    D   85.60   57.10     NaN     NaN   85.60   57.10
    1948    B   51.50   34.60     NaN     NaN   51.50   34.60
    700     E   21.99   92.91   21.99   92.91   98.14    6.02
    2040    D   40.37    8.25     NaN     NaN   40.37    8.25
    1664    C   66.18   58.47     NaN     NaN   66.18   58.47
    
    您可以使用相关条件来获得所需的数据帧:

    cond_set1a = (df.type.isin(type_group1)) & df.set1_a.isna()
    cond_set1b = (df.type.isin(type_group1)) & df.set1_b.isna()
    cond_set2 = df.type.isin(type_group2)
    
    df['set1_a'] = np.where(cond_set1a, df.set3_a,df.set1_a)
    df['set1_b'] = np.where(cond_set1b, df.set3_b,df.set1_b)
    df['set1_a'] = np.where(cond_set2, df.set2_a, df.set1_a)
    df['set1_b'] = np.where(cond_set2, df.set2_b, df.set1_b)
    
    df
    
            type    set1_a  set1_b  set2_a  set2_b  set3_a  set3_b
    581     A   27.08   27.08   NaN NaN 28.56   68.00
    1638    B   21.79   21.79   NaN NaN 21.79   59.30
    706     C   92.37   92.37   NaN NaN 95.15   0.00
    422     B   45.10   45.10   NaN NaN 45.10   51.42
    487     A   65.33   65.33   NaN NaN 65.33   59.07
    1503    D   85.60   85.60   NaN NaN 85.60   57.10
    1948    B   51.50   51.50   NaN NaN 51.50   34.60
    700     E   21.99   92.91   21.99   92.91   98.14   6.02
    2040    D   40.37   40.37   NaN NaN 40.37   8.25
    1664    C   66.18   66.18   NaN NaN 66.18   58.47
    

    根据您的使用情况,@Henry的numpy select将提供一种更简洁的方法。

    您是否可以发布您预期输出的数据帧?我已编辑并添加了预期输出。