尝试给出一个分组（由两个变量组成）平均值，如果不可能，则给出python中的列平均值_Python_Pandas_Try Catch_Pandas Groupby_Average

尝试给出一个分组（由两个变量组成）平均值，如果不可能，则给出python中的列平均值

python pandas

尝试给出一个分组（由两个变量组成）平均值，如果不可能，则给出python中的列平均值,python,pandas,try-catch,pandas-groupby,average,Python,Pandas,Try Catch,Pandas Groupby,Average,我尝试按2变量分组，并使用分组平均值来填充列中缺少的值。然后，如果这不起作用，我想按1变量分组，并给出分组平均值来填充同一列的缺失值，如果这不起作用，我想给出整列的平均值来填充缺失值（没有任何分组，因为这是我最后的选择）在数据集中，我有很多公司和5种不同的产品类型：笔记本电脑/台式机/显示器/桌子/手机例如，我想尝试按company\u name和pl\u category进行分组，并使用pl\u use\u energy\u demand\u（year\u tec）列的分组平均值来填充苹

我尝试按2变量分组，并使用分组平均值来填充列中缺少的值。然后，如果这不起作用，我想按1变量分组，并给出分组平均值来填充同一列的缺失值，如果这不起作用，我想给出整列的平均值来填充缺失值（没有任何分组，因为这是我最后的选择）
在数据集中，我有很多公司和5种不同的产品类型：
笔记本电脑/台式机/显示器/桌子/手机

例如，我想尝试按
company\u name
和
pl\u category
进行分组，并使用
pl\u use\u energy\u demand\u（year\u tec）
列的分组平均值来填充苹果平板电脑在
pl\u use\u energy\u demand\u（year\u tec）
列中缺失的值，该列为nan。然而，正如你所看到的，当I
groupby
Apple和Tablets\u IPAD时，没有数据可以给出一个双分组平均值来填充缺失的值，所以我想用苹果的平均值来填充nan值，如果没有整个苹果公司的数据，我想用整列平均值来填充苹果的nan值。因此，所需的输出将是填写
pl\u use\u energy\u demand\u（year\u tec）
列的nan值，使用上面解释的顺序和下面代码中显示的顺序：

features_to_impute = [ x for x in dat.columns if dat[x].dtypes != 'O' and dat[x].isnull().mean() > 0.3 and x.startswith('pl') ] def impute_cols(df,var_to_group1,var_to_group2,var_to_impute): return df.groupby([var_to_group1,var_to_group2])[var_to_impute].apply(lambda x: np.mean(x)) def impute_cols_2(df,var_to_group_1,var_to_impute): return df.groupby([var_to_group_1])[var_to_impute].apply(lambda x: np.mean(x)) for v in dat[features_to_impute]: try: dat[v+'imp'] = impute_cols(dat,'company_name','pl_category',v) except: TypeError try: dat[v+'imp'] = impute_cols_2(dat,'company_name',v) except: dat[v+'_imp'] = dat[v].fillna(dat[v].mean())
上面的代码即使没有给出错误，也会返回新的“\u imp”列，其中充满了NaN的
关于如何得到我需要的东西有什么建议吗？先谢谢你
我使用

except: TypeError

有时，当对数据帧进行分组时，它没有任何数据来给出分组平均值，因此我的说法是，去试试代码的下一部分。
我想你差不多做到了。创建新列的方式不起作用。在函数或fo循环中生成一个列表而不是pd.Series应该可以解决这个问题

features_to_impute = [ x for x in dat.columns if dat[x].dtypes != 'O' and dat[x].isnull().mean() > 0.3 and x.startswith('pl') ] def impute_cols(df,var_to_group1,var_to_group2,var_to_impute): return df.groupby([var_to_group1,var_to_group2])[var_to_impute].apply(lambda x: np.mean(x)) def impute_cols_2(df,var_to_group_1,var_to_impute): return df.groupby([var_to_group_1])[var_to_impute].apply(lambda x: np.mean(x)) for v in dat[features_to_impute]: try: # create a list() here dat[v+'imp'] = list(impute_cols(dat,'company_name','pl_category',v)) except: TypeError try: # and here dat[v+'imp'] = list(impute_cols_2(dat,'company_name',v)) except: dat[v+'_imp'] = dat[v].fillna(dat[v].mean())
试试这个，告诉我它是否有效。
将来，尝试创建一些可以复制的伪数据，而不是图片。这使得帮助变得更容易
并不是解决这一问题的最有效的方法，但由于时间的压力，我最终做了类似的事情，这实际上正是我想要它做的：

dict_list_1 = [] for v in dat[features_to_impute]: comp_mean = env.groupby('company')[v].mean().to_frame() dict_list_1.append(comp_mean) comp_means = pd.concat(dict_list_1,axis=1,ignore_index=(False)) comp_means.reset_index(inplace= True) def unique_id(df,col1,col2): return df[col1].astype(str) + "_" + df[col2].astype(str) dat['company_ptype'] = unique_id(dat,'company_name','pl_category') env['company_ptype'] = unique_id(env,'company','category') dict_list_2 = [] for x in dat[features_to_impute]: comp_ptype_mean = env.groupby(['company_ptype'])[x].mean().to_frame() dict_list_2.append(comp_ptype_mean) comp_ptype_means = pd.concat(dict_list_2,axis=1,ignore_index=(False)) comp_ptype_means.reset_index(inplace=True) dict_list_3 = [] for i in dat[features_to_impute]: prod_type_mean = env.groupby(['category'])[i].mean().to_frame() dict_list_3.append(prod_type_mean) prod_type_means = pd.concat(dict_list_3,axis=1,ignore_index=(False)) prod_type_means.reset_index(inplace=True) for x in dat[features_to_impute]: dat[x] = np.where(dat[x].isnull(),dat['company_ptype'].map(comp_ptype_means.set_index('company_ptype')[x]),dat[x]) # 1st step dat[x] = np.where(dat[x].isnull(),dat['pl_category'].map(prod_type_means.set_index('category')[x]),dat[x]) # 2nd step dat[x] = dat[x].fillna(dat[x].mean()) # 3rd step
@蒂托，如果你对如何提高效率有任何建议，我很高兴听到并使用它们

谢谢。
您能添加一些数据和所需的输出吗？这会让你更容易理解你想要实现的目标。如果上面的信息还不够，请告诉我。谢谢，我喜欢你解释问题的方式，这很有帮助。如下所述，生成一些解释问题并可复制的数据使他变得轻松，感谢您的回复。不幸的是，它不能满足我的需要。代码基本上总是给出整个列的平均值。所以它没有按照我希望它遵循的顺序。我相信try子句的最后一部分总是覆盖以前的输出（如果有的话）。我再次检查，我认为您希望我添加的list（）加法在创建列时不起作用。所以不是第三个try子句覆盖了它们，而是前两个try子句不起作用@我不确定我是否完全理解你的问题，抱歉。创建一个列表以在列适合我时插入结果。但我还是不明白你的数据结构。不，不幸的是，它没有-只有try子句代码的最后一部分生成结果。尝试为每个Try-close创建新列，您可能会看到相同的效果，仅当执行此
dat[v+'\u imp']=dat[v].fillna（dat[v].mean（））
时才会创建列。因此，我对代码进行了大量修改，并找到了另一种解决问题的方法——如果您愿意，我很乐意与您分享。当然！我很想看看你是怎么解决的。谢谢