在python中,查找一个变量在另一个变量的每十分位数内的比例

在python中,查找一个变量在另一个变量的每十分位数内的比例,python,pandas,Python,Pandas,我有以下数据集: HID Score Decile_Name Result 2089 62 4th decile 1 897 47 2nd decile 0 85 55 3rd decile 0 8 74 7th decile 1 23 31 1st decile 1 5657 77 8th decile 1 52 85 9th decile

我有以下数据集:

HID     Score   Decile_Name Result
2089    62      4th decile  1
897     47      2nd decile  0
85      55      3rd decile  0
8       74      7th decile  1
23      31      1st decile  1
5657    77      8th decile  1
52      85      9th decile  0
781     63      6th decile  0
565     42      1st decile  0
456     62      4th decile  1
12      89      10th decile 1
56      85      9th decile  1

#Create a DataFrame
df1 = {
     'HID':[2089,897,85,8,23,5657,52,781,565,456,12,56],
    'Score':[62,74,31,77,85,63,42,62,89,85],
    'Decile_Name':['4th decile','7th decile','1st decile','8th decile','9th decile','6th decile','1st decile','4th decile','10th decile','9th decile'],
    'Result' :[1,1,1,1,0,0,0,1,1,1]
]}



df1 = pd.DataFrame(df1,columns=['HID','Score','Decile_Name','Result'])
这将为每个学生捕获一个科目的分数以及相应的分数的十分位数。它还捕获学生是否通过考试(结果)

我想计算结果=1在每个十分位(结果%)和总体(在整个数据集中)中的比例。预期产出:

Attribute Level         Result %    num_of_stu  
Score - All Categories  0.5         12 # This captures the values for the whole df(df1).
Score - 1st Decile      0.5         2
Score - 2nd Decile      0           1
Score - 3rd Decile      0           1
...
Score - 9th Decile      0.5         2
Score - 10th Decile     1           1
有人能帮我做这个吗

#build mean of Results grouped by Decile Name
result_df = df1[['Decile_Name','Result']].groupby(['Decile_Name']).mean()

#build count of Students grouped by Decile Name
students_df = df1[['Decile_Name','HID']].groupby(['Decile_Name']).count()

#merge the two dataframes
merged_df = pd.concat([result_df, students_df], axis=1)

#Add the sum for all studends as Index "All Students"
merged_df.loc["All Studends"] = [df1[['Result']].mean()["Result"], df1[['HID']].count()["HID"]]

#print 
print(merged_df)
结果:

                 Result     HID
Decile_Name         
10th decile     1.000000    1.0
1st decile  0.500000    2.0
2nd decile  0.000000    1.0
3rd decile  0.000000    1.0
4th decile  1.000000    2.0
6th decile  0.000000    1.0
7th decile  1.000000    1.0
8th decile  1.000000    1.0
9th decile  0.500000    2.0
All Studends    0.583333    12.0

如果
0
1
值仅出现在
Result
列中,则解决方案如下:

首先按聚合,然后按整数对索引值进行排序,创建新的摘要数据框,然后:

一般解决方案-仅为
1
值创建boolena掩码:

df['Result1'] = df['Result'] == 1
df1 = df.groupby('Decile_Name').agg({'Result1':'mean', 'HID':'size'})
df1 = df1.iloc[df1.index.str.extract('(\d+)', expand=False).astype(int).argsort()]

df2 = pd.DataFrame({'Result1': [df['Result1'].mean()],
                  'HID': [len(df)]}, index=['All Categories'])

d = {'Result1':'Result %','HID':'num_of_stu'}
df1 = df2.append(df1).rename(columns=d)
print (df1)
                Result %  num_of_stu
All Categories  0.583333          12
1st decile      0.500000           2
2nd decile      0.000000           1
3rd decile      0.000000           1
4th decile      1.000000           2
6th decile      0.000000           1
7th decile      1.000000           1
8th decile      1.000000           1
9th decile      0.500000           2
10th decile     1.000000           1
df['Result1'] = df['Result'] == 1
df1 = df.groupby('Decile_Name').agg({'Result1':'mean', 'HID':'size'})
df1 = df1.iloc[df1.index.str.extract('(\d+)', expand=False).astype(int).argsort()]

df2 = pd.DataFrame({'Result1': [df['Result1'].mean()],
                  'HID': [len(df)]}, index=['All Categories'])

d = {'Result1':'Result %','HID':'num_of_stu'}
df1 = df2.append(df1).rename(columns=d)
print (df1)
                Result %  num_of_stu
All Categories  0.583333          12
1st decile      0.500000           2
2nd decile      0.000000           1
3rd decile      0.000000           1
4th decile      1.000000           2
6th decile      0.000000           1
7th decile      1.000000           1
8th decile      1.000000           1
9th decile      0.500000           2
10th decile     1.000000           1