Python 在数据框中获取重叠年龄范围内的年龄总和
所以我有了这个数据框,如上图所示,在年龄开始和年龄结束上有一些值,如图所示。但也有一些重叠的年龄段。我需要根据人员列中的已知值正确填写人员列 前两行的预期输出Python 在数据框中获取重叠年龄范围内的年龄总和,python,pandas,dataset,Python,Pandas,Dataset,所以我有了这个数据框,如上图所示,在年龄开始和年龄结束上有一些值,如图所示。但也有一些重叠的年龄段。我需要根据人员列中的已知值正确填写人员列 前两行的预期输出 target_value title people start end twitter_map 0 AGE_13_TO_17 13 to 17 1 13 17 AGE_13_TO_17 1 AGE_13_TO_24 13 to 24
target_value title people start end twitter_map
0 AGE_13_TO_17 13 to 17 1 13 17 AGE_13_TO_17
1 AGE_13_TO_24 13 to 24 NaN 13 24 NaN
2 AGE_13_TO_34 13 to 34 NaN 13 34 NaN
3 AGE_13_TO_49 13 to 49 NaN 13 49 NaN
4 AGE_13_TO_54 13 to 54 NaN 13 54 NaN
5 AGE_OVER_13 Age Over 13 NaN 13 - NaN
6 AGE_18_TO_24 18 to 24 7 18 24 AGE_18_TO_24
7 AGE_18_TO_54 18 to 54 NaN 18 54 NaN
8 AGE_OVER_18 Age Over 18 NaN 18 - NaN
9 AGE_21_TO_34 21 to 34 NaN 21 34 NaN
10 AGE_21_TO_49 21 to 49 NaN 21 49 NaN
11 AGE_21_TO_54 21 to 54 NaN 21 54 NaN
12 AGE_25_TO_34 25 to 34 34 25 34 AGE_25_TO_34
13 AGE_25_TO_49 25 to 49 NaN 25 49 NaN
14 AGE_OVER_25 Age Over 25 NaN 25 - NaN
15 AGE_35_TO_44 35 to 44 15 35 44 AGE_35_TO_44
16 AGE_OVER_35 Age Over 35 NaN 35 - NaN
17 AGE_45_TO_54 45 to 54 1 45 54 AGE_45_TO_54
18 AGE_OVER_50 Age Over 50 NaN 50 - NaN
19 AGE_55_TO_64 55 to 64 3 55 64 AGE_55_TO_64
20 AGE_OVER_65 65+ 6 65 - AGE_OVER_65
21 None All Ages NaN All Ages - NaN
我将使用一个简化的示例:
target_value title people start end twitter_map
0 AGE_13_TO_17 13 to 17 1 13 17 AGE_13_TO_17
1 AGE_13_TO_24 13 to 24 8 13 24 NaN
首先将-
替换为无穷大,并将all转换为float:
people start end
1 13 17
NaN 13 24
NaN 13 34
NaN 13 -
7 18 24
NaN 18 -
34 25 34
然后选择给定“人数”的行,这将是输入:
import numpy as np
df = df.replace({'-': np.inf}).astype(float)
现在定义以下函数:
df_input = df.dropna()
def func(row):
return df_input.loc[
(df_input['start'] >= row['start']) & (df_input['end'] <= row['end']),
'people'
].sum()
前三列已与后三列合并预期的输出是什么?我已经给出了前两行的示例。。。我希望这能解释汉克斯!这一次我跑得更快了;)
In [36]: df.apply(func, axis=1)
Out[36]:
0 1.0
1 8.0
2 42.0
3 42.0
4 7.0
5 41.0
6 34.0