Python 在数据框中获取重叠年龄范围内的年龄总和_Python_Pandas_Dataset

Python 在数据框中获取重叠年龄范围内的年龄总和

python pandas

Python 在数据框中获取重叠年龄范围内的年龄总和,python,pandas,dataset,Python,Pandas,Dataset,所以我有了这个数据框，如上图所示，在年龄开始和年龄结束上有一些值，如图所示。但也有一些重叠的年龄段。我需要根据人员列中的已知值正确填写人员列前两行的预期输出 target_value title people start end twitter_map 0 AGE_13_TO_17 13 to 17 1 13 17 AGE_13_TO_17 1 AGE_13_TO_24 13 to 24

所以我有了这个数据框，如上图所示，在年龄开始和年龄结束上有一些值，如图所示。但也有一些重叠的年龄段。我需要根据人员列中的已知值正确填写人员列

前两行的预期输出

    target_value        title    people     start end   twitter_map
0   AGE_13_TO_17      13 to 17       1        13  17  AGE_13_TO_17
1   AGE_13_TO_24      13 to 24     NaN        13  24           NaN
2   AGE_13_TO_34      13 to 34     NaN        13  34           NaN
3   AGE_13_TO_49      13 to 49     NaN        13  49           NaN
4   AGE_13_TO_54      13 to 54     NaN        13  54           NaN
5   AGE_OVER_13     Age Over 13    NaN        13   -           NaN
6   AGE_18_TO_24      18 to 24       7        18  24  AGE_18_TO_24
7   AGE_18_TO_54      18 to 54     NaN        18  54           NaN
8   AGE_OVER_18     Age Over 18    NaN        18   -           NaN
9   AGE_21_TO_34      21 to 34     NaN        21  34           NaN
10  AGE_21_TO_49      21 to 49     NaN        21  49           NaN
11  AGE_21_TO_54      21 to 54     NaN        21  54           NaN
12  AGE_25_TO_34      25 to 34      34        25  34  AGE_25_TO_34
13  AGE_25_TO_49      25 to 49     NaN        25  49           NaN
14   AGE_OVER_25    Age Over 25    NaN        25   -           NaN
15  AGE_35_TO_44      35 to 44      15        35  44  AGE_35_TO_44
16   AGE_OVER_35    Age Over 35    NaN        35   -           NaN
17  AGE_45_TO_54      45 to 54       1        45  54  AGE_45_TO_54
18   AGE_OVER_50    Age Over 50    NaN        50   -           NaN
19  AGE_55_TO_64      55 to 64       3        55  64  AGE_55_TO_64
20   AGE_OVER_65          65+        6        65   -   AGE_OVER_65
21          None       All Ages    NaN  All Ages   -           NaN

我将使用一个简化的示例：

    target_value        title    people     start end   twitter_map
0   AGE_13_TO_17      13 to 17       1        13  17    AGE_13_TO_17
1   AGE_13_TO_24      13 to 24       8        13  24           NaN

首先将

替换为无穷大，并将all转换为float：

people start end
     1    13  17
   NaN    13  24
   NaN    13  34
   NaN    13   -
     7    18  24
   NaN    18   -
    34    25  34

然后选择给定“人数”的行，这将是输入：

import numpy as np
df = df.replace({'-': np.inf}).astype(float)

现在定义以下函数：

df_input = df.dropna()

def func(row):
    return df_input.loc[
            (df_input['start'] >= row['start']) & (df_input['end'] <= row['end']),
            'people'
        ].sum()

前三列已与后三列合并预期的输出是什么？我已经给出了前两行的示例。。。我希望这能解释汉克斯！这一次我跑得更快了；）

In [36]: df.apply(func, axis=1)
Out[36]: 
0     1.0
1     8.0
2    42.0
3    42.0
4     7.0
5    41.0
6    34.0