Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/355.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何标记没有';是否满足单独列的特定标准?_Python_Pandas_Dataframe - Fatal编程技术网

Python 如何标记没有';是否满足单独列的特定标准?

Python 如何标记没有';是否满足单独列的特定标准?,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个df,看起来像这样: Name Letter Period Amount 123 H PRE 11 123 H PRE 14 123 H PRE 12 123 H DURING 5 123 H POST 100 456 H PRE 9 456 H DURING 50 456 H POST 600 789

我有一个df,看起来像这样:

Name  Letter  Period  Amount
123   H       PRE     11
123   H       PRE     14
123   H       PRE     12 
123   H       DURING  5
123   H       POST    100
456   H       PRE     9
456   H       DURING  50
456   H       POST    600
789   J       PRE     8
789   J       PRE     17
789   J       PRE     11
789   J       DURING  9
789   J       POST    201
789   J       POST    202
789   J       POST    200
我需要能够从Name中删除PRE计数不>=3或POST计数不>=3的值。这意味着将此逻辑应用于上述df后,将只存在名称789。123有3个前期,但只有1个后期,因此不包括在内

预期产出:

Name  Letter  Period  Amount
789   J       PRE     8
789   J       PRE     17
789   J       PRE     11
789   J       DURING  9
789   J       POST    201
789   J       POST    202
789   J       POST    200
试用过滤器

out = df.groupby('Name').filter(lambda x : (x['Period'].eq('PRE').sum()>=3) &
                                           (x['Period'].eq('POST').sum()>=3))
    Name Letter  Period  Amount
8    789      J     PRE       8
9    789      J     PRE      17
10   789      J     PRE      11
11   789      J  DURING       9
12   789      J    POST     201
13   789      J    POST     202
14   789      J    POST     200
另一种可能更快一些的方法是:获取“PRE”和“POST”大于或等于3的条件,并使用结果布尔值过滤数据帧:

cond1 = df.Period.eq("PRE").groupby(df.Name).transform("sum").ge(3)
cond2 = df.Period.eq("POST").groupby(df.Name).transform("sum").ge(3)
df.loc[cond1 & cond2]

    Name    Letter  Period  Amount
8   789     J       PRE     8
9   789     J       PRE     17
10  789     J       PR E    11
11  789     J       DURING  9
12  789     J       POST    201
13  789     J       POST    202
14  789     J       POST    200

这管用!谢谢,这是为了提高100%的速度。groupby+过滤器通常速度较慢