python—将1个数据帧中的fillna数据应用于另一个数据帧中的空值_Python_Pandas_Dataframe

python—将1个数据帧中的fillna数据应用于另一个数据帧中的空值

python pandas dataframe

python—将1个数据帧中的fillna数据应用于另一个数据帧中的空值,python,pandas,dataframe,Python,Pandas,Dataframe,请记住，我对编码和堆栈溢出还不熟悉，所以我可能无法简明扼要地描述这一点我有两个数据帧，df1是从CSV读取的，有一个列（名为“total_income”，其中包含NaN值，我想在另一个列（名为“income type”）中填入基于分类值的平均值我用这两列创建了第二个数据框，对“收入类型”列进行分组，然后找到“收入类型”列中每个唯一值的“总收入”列的平均值我用了这个： #groupby types of employment, calculate/fill in the mean df2 =

请记住，我对编码和堆栈溢出还不熟悉，所以我可能无法简明扼要地描述这一点

我有两个数据帧，df1是从CSV读取的，有一个列（名为“total_income”，其中包含NaN值，我想在另一个列（名为“income type”）中填入基于分类值的平均值

我用这两列创建了第二个数据框，对“收入类型”列进行分组，然后找到“收入类型”列中每个唯一值的“总收入”列的平均值

我用了这个：

#groupby types of employment, calculate/fill in the mean
df2 = df1[['income_type','total_income']]
df2 = df2.sort_values('income_type').groupby('income_type').mean().reset_index()

我的输出正是我想要的：

                       income_type  total_income
0                     business  32386.793835
1                civil servant  27343.729582
2                     employee  25820.841683
3                 entrepreneur  79866.103000
4  paternity / maternity leave   8612.661000
5                      retiree  21940.394503
6                      student  15712.260000
7                   unemployed  21014.360500

然而，现在我有了这些输出，我对如何在相同的关联列中将它们应用回df1感到困惑——因此，对于df1的“收入”类型中有“雇员”的所有行，我希望应用值25820.84（df2中的索引2:2employee 25820.841683

我知道我可以通过为每种收入类型、总收入对设置变量一个接一个地完成这项工作，但如果我可以在循环或函数中完成这项工作，那将更加整洁

以下是df1的前10列：

children days_employed dob   education      edu_id family_status family_sts_id gender income_type  debt  total_income
   1     -8437.673028   42  bachelor's degree      0       married          0      F    employee     0     40620.102   
   1     -4024.803754   36  secondary education    1       married          0      F    employee     0     17932.802   
   0     -5623.422610   33  Secondary Education    1       married          0      M    employee     0     23341.752   
   3     -4124.747207   32  secondary education    1       married          0      M    employee     0     42820.568   
   0      340266.072047 53  secondary education    1       civil partnership    1      F     retiree     0     25378.572   
   0     -926.185831    27  bachelor's degree      0       civil partnership    1      M    business     0     40922.170   
   0     -2879.202052   43  bachelor's degree      0       married          0      F    business     0     38484.156   
   0     -152.779569    50  SECONDARY EDUCATION    1       married          0      M    employee     0     21731.829   
   2     -6929.865299   35  BACHELOR'S DEGREE      0       civil partnership    1      F    employee     0     15337.093   
   0     -2188.756445   41  secondary education    1       married          0      M    employee     0     23108.150

前几天我问了另一个问题，我试图找出如何应用于此：

但我仍在努力解决如何通过替换将其从一个df循环到另一个df。任何帮助都将不胜感激！

您可以采取更简单的方法，如：

df['total_income'] = df.groupby('income_type')['total_income'].transform(lambda x: x.fillna(x.mean()))

这就是说，如果您真的想像之前那样创建第二个数据帧，可以将其保留为pd.Series，并使用它更新原始的

df1

，如下所示：

df2 = df1[['income_type','total_income']].copy()
income_mapper = df2.sort_values('income_type').groupby('income_type')['total_income'].mean()
df1.loc[df1.total_income.isna(), 'total_income'] = df1.loc[df1.total_income.isna(), 'income_type'].map(income_mapper)

哇！谢谢！我完全支持更简单。我对你的答案投了赞成票，但我没有足够的帖子来展示。感谢提供额外的信息-但我在本例中创建第二个数据框的唯一原因是因为我认为将这些值分开会更容易返回到第一个数据框（显然，这不是事实。：D）