Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/306.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何识别每个id的不完整详细信息?_Python_Pandas_Dataframe_Pandas Groupby - Fatal编程技术网

Python 如何识别每个id的不完整详细信息?

Python 如何识别每个id的不完整详细信息?,python,pandas,dataframe,pandas-groupby,Python,Pandas,Dataframe,Pandas Groupby,我有以下数据框: emp_id male female Month_Year 423 0 0 March-2016 423 0 0 April-2016 423 0 1 May-2016 423 0 1 June-2016 789 1 0 June-2017 789 1 0 Ju

我有以下数据框:

emp_id  male    female   Month_Year
423       0       0      March-2016
423       0       0      April-2016 
423       0       1      May-2016
423       0       1      June-2016

789       1       0      June-2017
789       1       0      July-2017
789       1       0      August-2017
789       0       0      September-2017

856       1       0      March-2018
856       1       0      April-2018

987       0       1      June-2019
987       0       1      July-2019
987       0       1      August-2019
mask = (df.assign(zeros=df['male'].eq(0))
          .groupby('emp_id')[['male', 'zeros']]
          .transform('sum')
          .all(axis=1))
df1 = df[mask]
print (df1)

mask = (df.assign(zeros=df['female'].eq(0))
          .groupby('emp_id')[['female', 'zeros']]
          .transform('sum')
          .all(axis=1))
df2 = df[mask]
print (df2)
请注意
男性
女性
列中的值如下所述:

1 - "Yes"
0 - "No"
我需要这样的东西

预期数据帧:

emp_id  male    female   Month_Year
423       0       0      March-2016
423       0       0      April-2016 
423       0       1      May-2016
423       0       1      June-2016

789       1       0      June-2017
789       1       0      July-2017
789       1       0      August-2017
789       0       0      September-2017

856       1       0      March-2018
856       1       0      April-2018

987       0       1      June-2019
987       0       1      July-2019
987       0       1      August-2019
mask = (df.assign(zeros=df['male'].eq(0))
          .groupby('emp_id')[['male', 'zeros']]
          .transform('sum')
          .all(axis=1))
df1 = df[mask]
print (df1)

mask = (df.assign(zeros=df['female'].eq(0))
          .groupby('emp_id')[['female', 'zeros']]
          .transform('sum')
          .all(axis=1))
df2 = df[mask]
print (df2)

请注意,
Var
列中的值如下所示:

1 - "the gender details are not missing"
0 - "the gender details are missing"
另外,请注意,一个emp\u id可以是男性也可以是女性,但不能同时是男性和女性。

如果在
男性
女性
列中观察到

对于
emp_id 423
,前两行没有性别详细信息。所以我在Var列中给出的值是零

对于
emp_id 789
,最后一行缺少性别详细信息。所以我在Var列中提到了零

对于
emp_id 856和987
,在特定时期内不缺少性别详细信息。所以我在Var列中给出一个值

我使用了以下代码:

emp_id  male    female   Month_Year
423       0       0      March-2016
423       0       0      April-2016 
423       0       1      May-2016
423       0       1      June-2016

789       1       0      June-2017
789       1       0      July-2017
789       1       0      August-2017
789       0       0      September-2017

856       1       0      March-2018
856       1       0      April-2018

987       0       1      June-2019
987       0       1      July-2019
987       0       1      August-2019
mask = (df.assign(zeros=df['male'].eq(0))
          .groupby('emp_id')[['male', 'zeros']]
          .transform('sum')
          .all(axis=1))
df1 = df[mask]
print (df1)

mask = (df.assign(zeros=df['female'].eq(0))
          .groupby('emp_id')[['female', 'zeros']]
          .transform('sum')
          .all(axis=1))
df2 = df[mask]
print (df2)
上述代码的输出:

emp_id  male    female   Month_Year
423       0       0      March-2016
423       0       0      April-2016 
423       0       1      May-2016
423       0       1      June-2016

789       1       0      June-2017
789       1       0      July-2017
789       1       0      August-2017
789       0       0      September-2017

856       1       0      March-2018
856       1       0      April-2018

987       0       1      June-2019
987       0       1      July-2019
987       0       1      August-2019
mask = (df.assign(zeros=df['male'].eq(0))
          .groupby('emp_id')[['male', 'zeros']]
          .transform('sum')
          .all(axis=1))
df1 = df[mask]
print (df1)

mask = (df.assign(zeros=df['female'].eq(0))
          .groupby('emp_id')[['female', 'zeros']]
          .transform('sum')
          .all(axis=1))
df2 = df[mask]
print (df2)

通过使用上述代码,我能够捕获emp_id,该id在男性和女性列中分别缺少详细信息

是否有其他方法可以一次比较两个列(男性和女性),并表示Var列中缺少的详细信息

因此,请让我知道解决方案:

提前谢谢

试试这个:

df['var']  =  (df.male + df.female).groupby(df.emp_id).transform('min')

In [39]: df
Out[39]:
    emp_id  male  female      Month_Year  var
0      423     0       0      March-2016    0
1      423     0       0      April-2016    0
2      423     0       1        May-2016    0
3      423     0       1       June-2016    0
4      789     1       0       June-2017    0
5      789     1       0       July-2017    0
6      789     1       0     August-2017    0
7      789     0       0  September-2017    0
8      856     1       0      March-2018    1
9      856     1       0      April-2018    1
10     987     0       1       June-2019    1
11     987     0       1       July-2019    1
12     987     0       1     August-2019    1

非常感谢你!这正是我要找的。@Shashidhar:不客气。很高兴我能帮忙:)