Python 如果一行满足特定条件,则在多索引数据帧中选择整个子组
我想在多索引数据帧中选择一个子组,如果该子组中的一行满足条件。这是一个简单的数据框架来解释我的问题:Python 如果一行满足特定条件,则在多索引数据帧中选择整个子组,python,pandas,dataframe,Python,Pandas,Dataframe,我想在多索引数据帧中选择一个子组,如果该子组中的一行满足条件。这是一个简单的数据框架来解释我的问题: col1=[0,0,0,0,2,4,6,0,0,0,100,200,300,400] col2=[0,0,0,0,4,6,8,0,0,0,200,900,400, 500] col3 = ['T','F','F','F','F','F','F','T','F','F','F','F','F', 'T'] d = {'Unit': [1, 1, 1, 1, 2, 2, 2, 3, 4, 5, 6
col1=[0,0,0,0,2,4,6,0,0,0,100,200,300,400]
col2=[0,0,0,0,4,6,8,0,0,0,200,900,400, 500]
col3 = ['T','F','F','F','F','F','F','T','F','F','F','F','F', 'T']
d = {'Unit': [1, 1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 6, 6, 6],
'Year': [2014, 2015, 2016, 2017, 2015, 2016, 2017, 2017, 2014, 2015, 2014, 2015, 2016, 2017], 'col1' : col1, 'col2' : col2 }
df = pd.DataFrame(data=d)
new_df = df.groupby(['Unit', 'Year']).sum()
new_df['col3'] = (new_df.groupby(level=0, group_keys=False)
.apply(lambda x: x.col1/x.col2.shift())
)
col1 col2 col3
Unit Year
1 2014 0 0 T
2015 0 0 F
2016 0 0 F
2017 0 0 F
2 2015 2 4 F
2016 4 6 F
2017 6 8 F
3 2017 0 0 T
4 2014 0 0 F
5 2015 0 0 F
6 2014 100 200 F
2015 200 900 F
2016 300 400 F
2017 400 500 T
所以我想选择所有在第3列中有一个T的子群
因此,我的输出如下所示:
col1 col2 col3
Unit Year
1 2014 0 0 T
2015 0 0 F
2016 0 0 F
2017 0 0 F
3 2017 0 0 T
6 2014 100 200 F
2015 200 900 F
2016 300 400 F
2017 400 500 T
提前谢谢大家,
Jen使用:
col1=[0,0,0,0,2,4,6,0,0,0,100,200,300,400]
col2=[0,0,0,0,4,6,8,0,0,0,200,900,400, 500]
col3 = ['T','F','F','F','F','F','F','T','F','F','F','F','F', 'T']
d = {'Unit': [1, 1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 6, 6, 6],
'Year': [2014, 2015, 2016, 2017, 2015, 2016, 2017, 2017, 2014, 2015, 2014, 2015, 2016, 2017],
'col1' : col1, 'col2' : col2, 'col3' : col3 }
df = pd.DataFrame(data=d)
df = df.set_index(['Unit','Year'])
df = df[df['col3'].eq('T').astype(int).groupby(level=0).transform('sum').eq(1)]
print (df)
col1 col2 col3
Unit Year
1 2014 0 0 T
2015 0 0 F
2016 0 0 F
2017 0 0 F
3 2017 0 0 T
6 2014 100 200 F
2015 200 900 F
2016 300 400 F
2017 400 500 T
详细信息:
比较列的等分方式,并将其转换为整数:
print (df['col3'].eq('T').astype(int))
Unit Year
1 2014 1
2015 0
2016 0
2017 0
2 2015 0
2016 0
2017 0
3 2017 1
4 2014 0
5 2015 0
6 2014 0
2015 0
2016 0
2017 1
Name: col3, dtype: int32
然后,使用“获取相同大小的系列”,对每个第一级计数sum
:
print (df['col3'].eq('T').astype(int).groupby(level=0).transform('sum'))
Unit Year
1 2014 1
2015 1
2016 1
2017 1
2 2015 0
2016 0
2017 0
3 2017 1
4 2014 0
5 2015 0
6 2014 1
2015 1
2016 1
2017 1
Name: col3, dtype: int32
比较依据1
和上一个筛选依据:
print (df[df['col3'].eq('T').astype(int).groupby(level=0).transform('sum').eq(1)])
col1 col2 col3
Unit Year
1 2014 0 0 T
2015 0 0 F
2016 0 0 F
2017 0 0 F
3 2017 0 0 T
6 2014 100 200 F
2015 200 900 F
2016 300 400 F
2017 400 500 T