Python 迭代groupby对象上的值_counts（）的结果_Python_Pandas

Python 迭代groupby对象上的值_counts（）的结果

python pandas

Python 迭代groupby对象上的值_counts（）的结果,python,pandas,Python,Pandas,我有一个类似于df=pd的数据帧（{'ID'：[1,1,2,2,3,3,4,4,5,5,5]，'Col1'：[Y'，'Y'，'Y'，'N'，'N'，'N'，'N'，'N'，'Y'，'Y'，'Y'，'N']）。我想做的是按“ID”列分组，然后获得三种情况的统计信息：有多少组只有“Y” 有多少组具有至少1个“Y”和至少1个“N” 有多少组只有“N” groups=df.groupby（'ID'）groups.Col1.value\u counts（）给了我一个可视化的表示我在寻找什么，但是我怎样

我有一个类似于df=pd的数据帧（{'ID'：[1,1,2,2,3,3,4,4,5,5,5]，'Col1'：[Y'，'Y'，'Y'，'N'，'N'，'N'，'N'，'N'，'Y'，'Y'，'Y'，'N']）。我想做的是按“ID”列分组，然后获得三种情况的统计信息：

有多少组只有“Y”

有多少组具有至少1个“Y”和至少1个“N”

有多少组只有“N”

groups=df.groupby（'ID'）groups.Col1.value\u counts（）

给了我一个可视化的表示我在寻找什么，但是我怎样才能迭代value_counts（）方法的结果来检查这些条件

groups = df.groupby('ID')  
answers = groups.Col1.value_counts()

for item in answers.iteritems(): 
    print(item)

您正在制作的是一个来自

value\u counts（）

的系列，您可以对它们进行迭代。请注意，这不是您想要的。您必须检查这些项目中的每一项，以进行您正在寻找的测试。

我认为可能更适合您的用例

代码 Groupby也可以完成这项工作，但要繁琐得多：

df_crosstab = df.groupby('ID')["Col1"]\
   .value_counts()\
   .rename("count")\
   .reset_index()\
   .pivot(index="ID", columns="Col1", values="count")\
   .fillna(0)

筛选组生成

df_交叉表

后，可以轻松构建3个问题的过滤器：

# 1. How many groups have only 'Y's
df_crosstab[df_crosstab['N'] == 0]

Col1  N  Y
ID        
1     0  2
4     0  2

# 2. How many groups have at least 1 'Y' and at least 1 'N'
df_crosstab[(df_crosstab['N'] > 0) & (df_crosstab['Y'] > 0)]

Col1  N  Y
ID        
2     1  1
5     2  1

# 3. How many groups have only 'N's
df_crosstab[df_crosstab['Y'] == 0]

Col1  N  Y
ID        
3     2  0

若您只需要组的数量，只需获取过滤后的交叉表数据帧的长度。我相信这也使自动化变得更容易。

如果您按“ID”分组并使用“sum”函数，那么每个组的所有字母都将在一行中。然后，您可以通过计算字符串来检查您的条件，并计算它们的总和来了解所有组的确切数字：

strings = df.groupby(['ID']).sum()

only_y = sum(strings['Col1'].str.count('N') == 0)
only_n = sum(strings['Col1'].str.count('Y') == 0)
both = sum((strings['Col1'].str.count('Y') > 0) & (strings['Col1'].str.count('N') > 0))

print('Number of groups with Y only: ' + str(only_y), 
      'Number of groups with N only: ' + str(only_n), 
      'Number of groups with at least one Y and one N: ' + str(both), 
      sep='\n')

谢谢你分享好的答案，有很好的解释。不客气！

strings = df.groupby(['ID']).sum()

only_y = sum(strings['Col1'].str.count('N') == 0)
only_n = sum(strings['Col1'].str.count('Y') == 0)
both = sum((strings['Col1'].str.count('Y') > 0) & (strings['Col1'].str.count('N') > 0))

print('Number of groups with Y only: ' + str(only_y), 
      'Number of groups with N only: ' + str(only_n), 
      'Number of groups with at least one Y and one N: ' + str(both), 
      sep='\n')