Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/entity-framework/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 根据要计数的条件按计数分组_Python_Pandas_Dataframe - Fatal编程技术网

Python 根据要计数的条件按计数分组

Python 根据要计数的条件按计数分组,python,pandas,dataframe,Python,Pandas,Dataframe,假设我有一个df,看起来像这样: +-------+-------+-------+-------+-------+ | Data1 | Data2 | Data3 | State | Count | +-------+-------+-------+-------+-------+ | A | 100 | 1 | On | 2 | | A | 100 | 2 | On | 2 | | A | 200 | 3

假设我有一个df,看起来像这样:

+-------+-------+-------+-------+-------+
| Data1 | Data2 | Data3 | State | Count |
+-------+-------+-------+-------+-------+
| A     |   100 |     1 | On    |     2 |
| A     |   100 |     2 | On    |     2 |
| A     |   200 |     3 | Off   |     0 |
| B     |   100 |     1 | Off   |     1 |
| B     |   100 |     1 | On    |     1 |
| B     |   100 |     1 | On    |     1 |
+-------+-------+-------+-------+-------+

df=pd.DataFrame({'Data1':['A','A','B','B','B'],
'数据2':[100100200100100100100],
'数据3':[1,2,3,1,1,1],
‘状态’:[‘开’、‘开’、‘关’、‘关’、‘开’、‘开’]}
我想对Data1、Data2进行分组,然后对Data3进行nunique计数,但只对状态值为“on”的一个进行计数

所以我的结果是这样的:

+-------+-------+-------+-------+-------+
| Data1 | Data2 | Data3 | State | Count |
+-------+-------+-------+-------+-------+
| A     |   100 |     1 | On    |     2 |
| A     |   100 |     2 | On    |     2 |
| A     |   200 |     3 | Off   |     0 |
| B     |   100 |     1 | Off   |     1 |
| B     |   100 |     1 | On    |     1 |
| B     |   100 |     1 | On    |     1 |
+-------+-------+-------+-------+-------+
我知道这是错误的,因为它是按状态分组的,但我不知道如何使它只按Data1和Data2分组,而只按State='On'close进行计数

df['Count'] = df.groupby(['Data1', 'Data2', 'State'])['Data3'].transform('nunique')

感谢所有的帮助

让我们试试
reindex

df['Count'] = df[df['State'].eq('On')].groupby(['Data1','Data2'])['Data3'].nunique().reindex(df.Data3).values

让我们试试
reindex

df['Count'] = df[df['State'].eq('On')].groupby(['Data1','Data2'])['Data3'].nunique().reindex(df.Data3).values

您还可以使用
groupby.nunique
执行布尔掩码,然后执行左合并:

cols = ['Data1','Data2']
m = df[df['State'].eq("On")].groupby(cols)['Data3'].nunique()
out = (df.merge(m,left_on=cols,right_index=True,how='left',suffixes=('','_counts'))
       .fillna({"Data3_counts":0}))


您还可以使用
groupby.nunique
执行布尔掩码,然后执行左合并:

cols = ['Data1','Data2']
m = df[df['State'].eq("On")].groupby(cols)['Data3'].nunique()
out = (df.merge(m,left_on=cols,right_index=True,how='left',suffixes=('','_counts'))
       .fillna({"Data3_counts":0}))


df[df['State']=='On'].groupby。df[df['State']=='On'].groupby。这不算全部吗?我只想计算数据3的唯一值?@mike_gundy123然后让我们尝试执行agg函数,然后重新索引back我似乎在重新索引部分参数“tuples”的类型不正确(预期为numpy.ndarray,得到category)时出错,这不算所有参数吗?我只想为数据3计算唯一值?@mike_gundy123然后让我们尝试执行agg函数然后重新索引back我似乎在重新索引部分参数“tuples”的类型不正确(预期为numpy.ndarray,已分类)时出错。你太棒了。非常感谢。你太棒了。非常感谢。