Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/324.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫中多索引数据帧的累积百分比_Python_Pandas - Fatal编程技术网

Python 熊猫中多索引数据帧的累积百分比

Python 熊猫中多索引数据帧的累积百分比,python,pandas,Python,Pandas,我想计算熊猫中多索引数据帧的累积百分比,但无法使其工作 import pandas as pd to_df = {'domain': {(12, 12): 2, (14, 14): 1, (15, 15): 2, (15, 17): 2, (17, 17): 1}, 'time': {(12, 12): 1, (14, 14): 1, (15, 15): 2, (15, 17): 1, (17, 17): 1}, 'weight': {(12, 12): 3, (14, 14): 4,

我想计算熊猫中多索引数据帧的累积百分比,但无法使其工作

import pandas as pd

to_df = {'domain': {(12, 12): 2, (14, 14): 1, (15, 15): 2, (15, 17): 2, (17, 17): 1},
 'time': {(12, 12): 1, (14, 14): 1, (15, 15): 2, (15, 17): 1, (17, 17): 1},
 'weight': {(12, 12): 3,
  (14, 14): 4,
  (15, 15): 1,
  (15, 17): 2,
  (17, 17): 5}}

df = pd.DataFrame.from_dict(to_df)

       domain  time  weight
12 12       2     1       3
14 14       1     1       4
15 15       2     2       1
   17       2     1       2
17 17       1     1       5


df = df.groupby(['time', 'domain']).apply(
 pd.DataFrame.sort_values, 'weight', ascending=True)
cumsum()按预期工作

df["cum_sum_time_domain"] = df.groupby(['time', 'domain'])['weight'].cumsum()



               domain  time  weight  cum_sum_time_domain
time domain                                                 
1    1      14 14       1     1       4                    4
            17 17       1     1       5                    9
     2      15 17       2     1       2                    2
            12 12       2     1       3                    5
2    2      15 15       2     2       1                    1
运行这些命令本身是可行的

df.groupby(['time', 'domain']).weight.sum()
df.groupby(['time', 'domain'])['weight'].sum()
然而,这两项任务突然产生了“南”

df["sum_time_domain"] = df.groupby(['time', 'domain']).weight.sum()
df
df["sum_time_domain"] = df.groupby(['time', 'domain'])['weight'].sum()
df
将这两者结合起来会出现错误:“未实现在多索引上合并多个级别重叠”

df["cum_perc_time_domain"] = 100 * df.groupby(['time', 'domain'])['weight'].cumsum() / df.groupby(
 ['time', 'domain'])['weight'].sum()
我想你需要有
sum
。此外,对于排序
groupby
,也不需要,请仅使用:

df = df.sort_values(['time','domain','weight'])

print (df.groupby(['time', 'domain']).weight.transform('sum'))
14  14    9
17  17    9
15  17    5
12  12    5
15  15    1
Name: weight, dtype: int64

df["cum_perc_time_domain"] = 100 * df.groupby(['time', 'domain'])['weight'].cumsum() / 
                                   df.groupby(['time', 'domain']).weight.transform('sum')
print (df)
       domain  time  weight  cum_perc_time_domain
14 14       1     1       4             44.444444
17 17       1     1       5            100.000000
15 17       2     1       2             40.000000
12 12       2     1       3            100.000000
15 15       2     2       1            100.000000