Python 带条件公式的Pandas-Groupby

Python 带条件公式的Pandas-Groupby,python,pandas,dataframe,conditional-statements,pandas-groupby,Python,Pandas,Dataframe,Conditional Statements,Pandas Groupby,考虑到上述数据帧,是否有一种优雅的方法可以通过条件groupby? 我想根据以下条件将数据分成两组: Survived SibSp Parch 0 0 1 0 1 1 1 0 2 1 0 0 3 1 1 0 4 0 0 1 m1 = (df['SibSp'] > 0) | (df['Parch'] &

考虑到上述数据帧,是否有一种优雅的方法可以通过条件
groupby
? 我想根据以下条件将数据分成两组:

   Survived  SibSp  Parch
0         0      1      0
1         1      1      0
2         1      0      0
3         1      1      0
4         0      0      1
m1 = (df['SibSp'] > 0) | (df['Parch'] > 0)

df = df.groupby(np.where(m1, 'Has Family', 'No Family'))['Survived'].mean()
print (df)
Has Family    0.5
No Family     1.0
Name: Survived, dtype: float64
然后采用这两个组的方法,得到如下输出:

(df['SibSp'] > 0) | (df['Parch'] > 0) =   New Group -"Has Family"
 (df['SibSp'] == 0) & (df['Parch'] == 0) = New Group - "No Family"

是否可以使用groupby完成,或者我是否必须使用上述条件语句追加一个新列?

如果列
SibSp
Parch
中的never值小于
0
,请仅使用一个条件:

               SurvivedMean
 Has Family    Mean
 No Family     Mean
如果无法使用,则首先使用两种条件:

   Survived  SibSp  Parch
0         0      1      0
1         1      1      0
2         1      0      0
3         1      1      0
4         0      0      1
m1 = (df['SibSp'] > 0) | (df['Parch'] > 0)

df = df.groupby(np.where(m1, 'Has Family', 'No Family'))['Survived'].mean()
print (df)
Has Family    0.5
No Family     1.0
Name: Survived, dtype: float64

一种简单的分组方法是使用这两列的总和。如果其中一个为正,则结果将大于1。groupby接受任意数组,只要长度与数据帧的长度相同,就不需要添加新列

m1 = (df['SibSp'] > 0) | (df['Parch'] > 0)
m2 = (df['SibSp'] == 0) & (df['Parch'] == 0)
a = np.where(m1, 'Has Family', 
    np.where(m2, 'No Family', 'Not'))

df = df.groupby(a)['Survived'].mean()
print (df)
Has Family    0.5
No Family     1.0
Name: Survived, dtype: float64

您可以在列表中定义条件,并使用下面的函数
按条件分组
,为每个条件创建筛选列表。之后,您可以使用模式匹配选择结果项:

family = np.where((df['SibSp'] + df['Parch']) >= 1 , 'Has Family', 'No Family')
df.groupby(family)['Survived'].mean()
Out: 
Has Family    0.5
No Family     1.0
Name: Survived, dtype: float64

这看起来像是布尔索引的工作。您的df是用二进制编码的吗?如果是这样,您可以使用pandas方法获取{}。否则,是的,我建议/认为您应该创建一个新列(我认为您只需要一个列)来执行groupby on。如果我对您正在做的事情有更好的了解,我可以帮助您编写一些代码!另外,考虑到您想要的输出,您似乎还需要旋转db!