Python 基于数据透视表值替换数据框中的值_Python_Pandas

Python 基于数据透视表值替换数据框中的值

python pandas

Python 基于数据透视表值替换数据框中的值,python,pandas,Python,Pandas,我想根据数据透视表中给定的值替换数据帧“年龄列”中的nan值 0为女性，1为男性例如，如果男孩的年龄被忽略，如果他/她是3级，性别为0，那么他的年龄是25岁我有大约100行需要更新，所以有没有快速的方法？您可以使用首先创建pivot\u表，然后将其与df合并回pivoting中的附加列，如果观察到NaN，则替换值您可以进一步决定保留或删除Avg\u Age\u Pivot列我还注意到，根据您提供的数据量，pivot_表中有NaN值，因此您无法看到当前df值的预期结果我会将pivot表转换

我想根据数据透视表中给定的值替换数据帧“年龄列”中的nan值

0为女性，1为男性

例如，如果男孩的年龄被忽略，如果他/她是3级，性别为0，那么他的年龄是25岁

我有大约100行需要更新，所以有没有快速的方法？

您可以使用首先创建

pivot\u表

，然后将其与

df

合并回pivoting中的附加列，如果观察到

NaN

，则替换值

您可以进一步决定保留或删除

Avg\u Age\u Pivot

列

我还注意到，根据您提供的数据量，pivot_表中有

NaN

值，因此您无法看到当前

df

值的预期结果我会将pivot表转换为常规df

pdf=pivot\u table.stack（）.reset\u index（）

然后与

nan

df

和

首先合并
nan_df = df.loc[df['Age'].isna(), ['Pclass', 'Gender']].merge(pdf, how='left')
df.set_index(['Pclass', 'Gender']).combine_first(nan_df.set_index(['Pclass', 'Gender'])).reset_index()

请看这个方法。通过组合“PClass”和“Gender”列，创建了一个名为new的公共列。然后使用map
和df.fillna
替换NaN值。我必须创建这个新列，因为我只能在pd.series
上应用map
方法
投入：
import io
df1  = pd.read_csv(io.StringIO("""
PClass Gender Age
  3      1     22
  1      0     38
  2      1     27
  3      0    NaN
  """), sep=r"\s{1,}", engine="python") 

import io
df2  = pd.read_csv(io.StringIO("""
PClass  Gender Age
    1     0  40
    2     0  30
    3     0  25
    1     1  35
    2     1  28
    3     1  21
  """), sep=r"\s{1,}", engine="python")

df1（实际df）
df2（数据透视表）
代码：
印刷品：
   PClass  Gender   Age
0       3       1  22.0
1       1       0  38.0
2       2       1  27.0
3       3       0  25.0

请提供df的文本（而非图像）。例如，您可以使用df.to_dict（）。年龄列中没有nan
s…我刚刚更新了文本非常感谢！然而，我不太明白pivot_表中的“NaN”值是什么意思，因为有3x2和6个条目。在答案中添加了df_pivot
输出，对于数据本身不存在的pivot索引，aggfunc将返回NaN哦，我明白了！这不是实际数据，不用担心。谢谢
df _pivot O/P --->

>>> pd.pivot_table(df,index=['PClass'],columns=['Gender'],values=['Age'],aggfunc='mean') ### you can choose your own aggfunc
         Age      
Gender     0     1
PClass            
1       38.0   NaN
2        NaN  27.0
3        NaN  22.0


nan_df = df.loc[df['Age'].isna(), ['Pclass', 'Gender']].merge(pdf, how='left')
df.set_index(['Pclass', 'Gender']).combine_first(nan_df.set_index(['Pclass', 'Gender'])).reset_index()

   Pclass  Gender   Age
0       1       0  38.0
1       2       1  27.0
2       3       0  25.0
3       3       1  22.0

import io
df1  = pd.read_csv(io.StringIO("""
PClass Gender Age
  3      1     22
  1      0     38
  2      1     27
  3      0    NaN
  """), sep=r"\s{1,}", engine="python") 

import io
df2  = pd.read_csv(io.StringIO("""
PClass  Gender Age
    1     0  40
    2     0  30
    3     0  25
    1     1  35
    2     1  28
    3     1  21
  """), sep=r"\s{1,}", engine="python")

  PClass  Gender   Age
0       3       1  22.0
1       1       0  38.0
2       2       1  27.0
3       3       0   NaN

  PClass  Gender  Age
0       1       0   40
1       2       0   30
2       3       0   25
3       1       1   35
4       2       1   28
5       3       1   21

df1['new'] = df1['PClass'].astype(str)+df1['Gender'].astype(str)
df2['new'] = df2['PClass'].astype(str)+df2['Gender'].astype(str)
fill = df2.set_index(['new'])['Age'].to_dict()
df1['Age'] = df1['Age'].fillna(df1['new'].map(fill))
df1 = df1.drop('new',axis=1)
print(df1)

   PClass  Gender   Age
0       3       1  22.0
1       1       0  38.0
2       2       1  27.0
3       3       0  25.0