Python 如何解决这个问题:通过添加权重进行采样偏差校正

Python 如何解决这个问题:通过添加权重进行采样偏差校正,python,Python,如果我有一个数据集(抽样或调查),其中包含400000个人ID,以及该个人所属的人口类别(年龄、种族和教育水平)。前30行: 通过使用python,如何计算一组 无偏数据集的个人级别权重(每人一个权重)。每个类别中权重的总和 应该是演示地面真相数据集中的数据 演示地面真相数据集: demographic category,number of individuals 18_24,11839159 25_34,16399632 35_44,15335704 45_54,16430762 55_

如果我有一个数据集(抽样或调查),其中包含400000个人ID,以及该个人所属的人口类别(年龄、种族和教育水平)。前30行:



通过使用python,如何计算一组 无偏数据集的个人级别权重(每人一个权重)。每个类别中权重的总和 应该是演示地面真相数据集中的数据


演示地面真相数据集:

demographic category,number of individuals
18_24,11839159
25_34,16399632
35_44,15335704
45_54,16430762
55_64,15148777
65_74,9990412
75_84,5221430
0_4,7500407
5_9,7748669
10_14,7815759
15_17,4758751
85_120,2293226
< Than HS Diploma,12274025
Bachelor Degree,16305721
Graduate Degree,9343192
HS Diploma,25799018
Some College,28937146
asian,6145151
black,14626476
hispanic,21953456
islander,190389
white,73838168
人口统计类别、个人数量
18_24,11839159
25_34,16399632
35_44,15335704
45_54,16430762
55_64,15148777
65_74,9990412
75_84,5221430
0_4,7500407
5_9,7748669
10_14,7815759
15_17,4758751
85_120,2293226
请将示例数据帧添加为文本,而不是屏幕快照。可能需要此库quantipy3和算法:Quantipy的RIM加权算法。
demographic category,number of individuals
18_24,11839159
25_34,16399632
35_44,15335704
45_54,16430762
55_64,15148777
65_74,9990412
75_84,5221430
0_4,7500407
5_9,7748669
10_14,7815759
15_17,4758751
85_120,2293226
< Than HS Diploma,12274025
Bachelor Degree,16305721
Graduate Degree,9343192
HS Diploma,25799018
Some College,28937146
asian,6145151
black,14626476
hispanic,21953456
islander,190389
white,73838168
answer = {'demographic category':[],
          'number of individuals':[],
          }

for k in df['demographic category'].unique():
    answer['demographic category'].append(k)
    answer['number of individuals'].append(df[df['demographic category']==k].shape[0])

for k in df.age.unique():
    answer['demographic category'].append(k)
    answer['number of individuals'].append(df[df.age==k].shape[0])

answer = pandas.DataFrame(answer)