Python 用管道表示子集合_Python_Pandas_Pipe

Python 用管道表示子集合

python pandas

Python 用管道表示子集合,python,pandas,pipe,Python,Pandas,Pipe,我有这样一个数据帧： a b x y 0 1 2 3 -1 1 2 4 6 -2 2 3 6 6 -3 3 4 8 3 -4 df = df[(df.a >= 2) & (df.b <= 8)] df = df.groupby(df.x).mean() abxy 0 1 2 3 -1 1 2 4 6 -2 2 3 6 6 -3 3 4 8 3 -4 df=df[（df.a>=2）和（df.b2）

我有这样一个数据帧：

   a  b   x  y
0  1  2   3 -1
1  2  4   6 -2
2  3  6   6 -3
3  4  8   3 -4

df = df[(df.a >= 2) & (df.b <= 8)]
df = df.groupby(df.x).mean()

abxy
0  1  2   3 -1
1  2  4   6 -2
2  3  6   6 -3
3  4  8   3 -4
df=df[（df.a>=2）和（df.b2）和（x.b<6）
.groupby（df.x）
.apply（λx:x.mean（））

您可以尝试，但我认为它更复杂：

print df[(df.a >= 2) & (df.b <= 8)].groupby(df.x).mean()
     a  b  x    y
x                
3  4.0  8  3 -4.0
6  2.5  5  6 -2.5


def masker(df, mask):
    return df[mask]

mask1 = (df.a >= 2)
mask2 = (df.b <= 8)     

print df.pipe(masker, mask1).pipe(masker, mask2).groupby(df.x).mean()
     a  b  x    y
x                
3  4.0  8  3 -4.0
6  2.5  5  6 -2.5

打印df[（df.a>=2）和（df.b=2）
mask2=（df.b我相信这个方法对于你的过滤步骤和后续操作是很清楚的。但是，使用loc[（mask1）和（mask2）]
可能更有效
>>> (df
     .pipe(lambda x: x.loc[x.a >= 2])
     .pipe(lambda x: x.loc[x.b <= 8])
     .pipe(pd.DataFrame.groupby, 'x')
     .mean()
     )

     a  b    y
x             
3  4.0  8 -4.0
6  2.5  5 -2.5

>>（df
.管道（λx:x.loc[x.a>=2]）
.管道（λx:x.loc[x.b=2]）
.pipe（lambda x:x.loc[x.b只要您可以将一个步骤分类为返回数据帧并获取数据帧（可能有更多参数），那么您就可以使用pipe
。这样做是否有好处，是另一个问题
在这里，例如，您可以使用
df\
    .pipe(lambda df_, x, y: df_[(df_.a >= x) & (df_.b <= y)], 2, 8)\
    .pipe(lambda df_: df_.groupby(df_.x))\
    .mean()

df\
    .pipe(lambda df_, x, y: df[(df.a >= x) & (df.b <= y)], 2, 8)\
    .groupby('x')\
    .mean()

df\
.pipe（lambda df_uux，y:df_[（df_ua>=x）和（df_b）我不明白为什么在这里使用pipe
会有用。df[（df.a>=2）和（df.b这是真的@jme，这是一个玩具示例，在我更大的代码中，我有更多的步骤。加上
操作符使一切看起来更整洁。这基本上是似曾相识或：-），“如何将过滤器
步骤放入管道
？”谢谢@Ami，你在lambda中使用df，df也能正常工作吗？还有，我还可以使用.groupby not.pipe（lambda….groupby）？谢谢@Alexander，我还能用.groupby not.pipe吗（pd.DataFrame.groupby，'x'））也就是说，使用.groupby的方法与我上面的代码相同
df\
    .pipe(lambda df_, x, y: df_[(df_.a >= x) & (df_.b <= y)], 2, 8)\
    .pipe(lambda df_: df_.groupby(df_.x))\
    .mean()

    .pipe(lambda df_: df_[(df_.a >= 2) & (df_.b <= 8)])\

df\
    .pipe(lambda df_, x, y: df[(df.a >= x) & (df.b <= y)], 2, 8)\
    .groupby('x')\
    .mean()