Python:基于GROUPBY生成频率(总和和计数)
我试图在Python中复制类似于PROC SUMMARY的结果,并使用下面的函数作为堆栈溢出的可用函数:Python:基于GROUPBY生成频率(总和和计数),python,Python,我试图在Python中复制类似于PROC SUMMARY的结果,并使用下面的函数作为堆栈溢出的可用函数: def wmean_grouped2 (group, var_name_in, var_name_weight): d = group[var_name_in] w = group[var_name_weight] return (d * w).sum() / w.sum() FUNCS = { "mean" : np.mean , "sum"
def wmean_grouped2 (group, var_name_in, var_name_weight):
d = group[var_name_in]
w = group[var_name_weight]
return (d * w).sum() / w.sum()
FUNCS = { "mean" : np.mean ,
"sum" : np.sum ,
"count" : np.count_nonzero }
def my_summary2 (
data ,
var_names_in ,
var_names_out ,
var_functions ,
var_name_weight = None ,
var_names_group = None ):
result = pd.DataFrame()
if var_names_group is None:
grouped = data.groupby (lambda x: True)
else:
grouped = data.groupby (var_names_group)
for var_name_in, var_name_out, var_function in \
zip(var_names_in,var_names_out,var_functions):
if var_function == "wsum":
func = lambda x : wmean_grouped2 (x, var_name_in, var_name_weight)
result[var_name_out] = pd.Series(grouped.apply(func))
else:
func = FUNCS[var_function]
result[var_name_out] = grouped[var_name_in].apply(func)
return result
我调用了如下函数:
print(my_summary2 (
data=df,
var_names_in=["sal","sal","age"] ,
var_names_out=[
"COUNT","SAL","age"
] ,
var_functions=["count","sum","sum"] ,
var_name_weight="val_1" ,
var_names_group=["name"]
))
并获得以下输出:
COUNT SAL age
name
Arik 1 100 32
David 2 260 88
John 2 500 67
Peter 1 100 33
请您帮助生成以下输出:
(i) “名称”列后的新行
(ii)插入连字符(-)后每个变量的总日照数
我能够使用以下代码生成每个列的总和:
result.loc['Total'] = result.select_dtypes(pd.np.number).sum()
在返回结果之前
result.loc['Total'] = result.select_dtypes(pd.np.number).sum()