Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/348.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如果一行满足特定条件,则在多索引数据帧中选择整个子组_Python_Pandas_Dataframe - Fatal编程技术网

Python 如果一行满足特定条件,则在多索引数据帧中选择整个子组

Python 如果一行满足特定条件,则在多索引数据帧中选择整个子组,python,pandas,dataframe,Python,Pandas,Dataframe,我想在多索引数据帧中选择一个子组,如果该子组中的一行满足条件。这是一个简单的数据框架来解释我的问题: col1=[0,0,0,0,2,4,6,0,0,0,100,200,300,400] col2=[0,0,0,0,4,6,8,0,0,0,200,900,400, 500] col3 = ['T','F','F','F','F','F','F','T','F','F','F','F','F', 'T'] d = {'Unit': [1, 1, 1, 1, 2, 2, 2, 3, 4, 5, 6

我想在多索引数据帧中选择一个子组,如果该子组中的一行满足条件。这是一个简单的数据框架来解释我的问题:

col1=[0,0,0,0,2,4,6,0,0,0,100,200,300,400]
col2=[0,0,0,0,4,6,8,0,0,0,200,900,400, 500]
col3 = ['T','F','F','F','F','F','F','T','F','F','F','F','F', 'T']

d = {'Unit': [1, 1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 6, 6, 6], 
 'Year': [2014, 2015, 2016, 2017, 2015, 2016, 2017, 2017, 2014, 2015, 2014, 2015, 2016, 2017], 'col1' : col1, 'col2' : col2 }
df = pd.DataFrame(data=d)

new_df = df.groupby(['Unit', 'Year']).sum()

new_df['col3'] = (new_df.groupby(level=0, group_keys=False)
                  .apply(lambda x: x.col1/x.col2.shift())
                 )

           col1  col2      col3
Unit Year                      
1    2014     0     0       T
     2015     0     0       F
     2016     0     0       F
     2017     0     0       F
2    2015     2     4       F
     2016     4     6       F
     2017     6     8       F
3    2017     0     0       T
4    2014     0     0       F
5    2015     0     0       F
6    2014   100   200       F
     2015   200   900       F
     2016   300   400       F
     2017   400   500       T


所以我想选择所有在第3列中有一个T的子群

因此,我的输出如下所示:

           col1  col2      col3
Unit Year                      
1    2014     0     0       T
     2015     0     0       F
     2016     0     0       F
     2017     0     0       F
3    2017     0     0       T
6    2014   100   200       F
     2015   200   900       F
     2016   300   400       F
     2017   400   500       T
提前谢谢大家,

Jen使用:

col1=[0,0,0,0,2,4,6,0,0,0,100,200,300,400]
col2=[0,0,0,0,4,6,8,0,0,0,200,900,400, 500]
col3 = ['T','F','F','F','F','F','F','T','F','F','F','F','F', 'T']

d = {'Unit': [1, 1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 6, 6, 6], 
 'Year': [2014, 2015, 2016, 2017, 2015, 2016, 2017, 2017, 2014, 2015, 2014, 2015, 2016, 2017], 
         'col1' : col1, 'col2' : col2, 'col3' : col3 }
df = pd.DataFrame(data=d)

df = df.set_index(['Unit','Year'])

df = df[df['col3'].eq('T').astype(int).groupby(level=0).transform('sum').eq(1)]
print (df)
           col1  col2 col3
Unit Year                 
1    2014     0     0    T
     2015     0     0    F
     2016     0     0    F
     2017     0     0    F
3    2017     0     0    T
6    2014   100   200    F
     2015   200   900    F
     2016   300   400    F
     2017   400   500    T
详细信息

比较列的等分方式,并将其转换为整数:

print (df['col3'].eq('T').astype(int))
Unit  Year
1     2014    1
      2015    0
      2016    0
      2017    0
2     2015    0
      2016    0
      2017    0
3     2017    1
4     2014    0
5     2015    0
6     2014    0
      2015    0
      2016    0
      2017    1
Name: col3, dtype: int32
然后,使用“获取相同大小的系列”,对每个第一级计数
sum

print (df['col3'].eq('T').astype(int).groupby(level=0).transform('sum'))
Unit  Year
1     2014    1
      2015    1
      2016    1
      2017    1
2     2015    0
      2016    0
      2017    0
3     2017    1
4     2014    0
5     2015    0
6     2014    1
      2015    1
      2016    1
      2017    1
Name: col3, dtype: int32
比较依据
1
和上一个筛选依据:

print (df[df['col3'].eq('T').astype(int).groupby(level=0).transform('sum').eq(1)])
           col1  col2 col3
Unit Year                 
1    2014     0     0    T
     2015     0     0    F
     2016     0     0    F
     2017     0     0    F
3    2017     0     0    T
6    2014   100   200    F
     2015   200   900    F
     2016   300   400    F
     2017   400   500    T