python&；熊猫：如何将数据帧拆分为组_Python_Pandas

python&；熊猫：如何将数据帧拆分为组

python pandas

python&；熊猫：如何将数据帧拆分为组,python,pandas,Python,Pandas,我想将数据框按20岁以下、20至24岁、25至30岁和30岁以上的年龄组进行划分。我可以用数组和范围迭代器来实现，但是我想知道是否有更好的方法来实现这一点 gates = [0,20,25,30,50] total = df.agepreg.isnull().sum() print("INAPPLICABLE {0}".format(total)) for i in range(0, 4): t = df.agepreg[(df.agepreg>=gates[i]) & (

我想将数据框按20岁以下、20至24岁、25至30岁和30岁以上的年龄组进行划分。我可以用数组和范围迭代器来实现，但是我想知道是否有更好的方法来实现这一点

gates = [0,20,25,30,50]
total = df.agepreg.isnull().sum()
print("INAPPLICABLE {0}".format(total))
for i in range(0, 4):
    t = df.agepreg[(df.agepreg>=gates[i]) & (df.agepreg<gates[i+1])].value_counts().sum()
    print("{0} to {1} {2}".format(gates[i], gates[i+1], t))
    total += t
print("Total {0}".format(total))

这些数据来自于。这本免费的书有附带的代码和数据

从“code”目录中，可以运行以下行来加载数据帧

import nsfg
df = nsfg.ReadFemPreg()
df

您可以在

pd.cut上groupby
（df['agrpreg']，[20,24,25,30，pd.np.inf]，right=False）

创建一个包含100行且值介于20和35之间的数据帧

In [643]: df = pd.DataFrame(pd.np.random.randint(20, 35, 100), columns=['agrpreg'])

In [644]: df_cuts = (df
                     .groupby(pd.cut(df['agrpreg'], [20,24,25,30,pd.np.inf], right=False))
                     .sum())

In [645]: df_cuts
Out[645]:
           agrpreg
agrpreg
[20, 24)       532
[24, 25)       192
[25, 30)       878
[30, inf)     1093

检查两个和是否匹配

In [646]: df_cuts.sum() == df['agrpreg'].sum()
Out[646]:
agrpreg    True
dtype: bool

要了解pd.cut的功能，请将每个值放入存储箱中

In [647]: df[:5]
Out[647]:
   agrpreg
0       29
1       25
2       22
3       28
4       23

In [648]: pd.cut(df['agrpreg'], [20,24,25,30,pd.np.inf], right=False)[:5]
Out[648]:
0    [25, 30)
1    [25, 30)
2    [20, 24)
3    [25, 30)
4    [20, 24)
Name: agrpreg, dtype: category
Categories (4, object): [[20, 24) < [24, 25) < [25, 30) < [30, inf)]

[647]中的

：df[：5]
出[647]：
阿格雷格
0       29
1       25
2       22
3       28
4       23
在[648]中：pd.cut（df['agrpreg']，[20,24,25,30，pd.np.inf]，right=False）[:5]
出[648]：
0    [25, 30)
1    [25, 30)
2    [20, 24)
3    [25, 30)
4    [20, 24)
名称：agrpreg，数据类型：类别
类别（4，对象）：[[20，24）<[24，25）<[25，30）<[30，inf]

你能发布你的样本数据吗？你能统计一下总数吗？我想知道每个年龄段的怀孕次数。而不是统计一下？

In [647]: df[:5]
Out[647]:
   agrpreg
0       29
1       25
2       22
3       28
4       23

In [648]: pd.cut(df['agrpreg'], [20,24,25,30,pd.np.inf], right=False)[:5]
Out[648]:
0    [25, 30)
1    [25, 30)
2    [20, 24)
3    [25, 30)
4    [20, 24)
Name: agrpreg, dtype: category
Categories (4, object): [[20, 24) < [24, 25) < [25, 30) < [30, inf)]