在R中满足条件后筛选组中的后续行

在R中满足条件后筛选组中的后续行,r,filter,subset,R,Filter,Subset,对于以下示例数据集,我需要在第一次购买(CustomerStatus=Purchased)后删除客户(CustomerID)的所有行。有些客户不购买该产品,我仍然希望保留对这些客户的任何观察。日期变量必须保留 我在删除组中的行时遇到困难。原始数据并没有像这样很好地分组,我试图简化我遇到的问题。感谢您的帮助 我提供了一个示例数据集: SalesPerson CustomerID Date CustomerStatus Amanda 2000 1/5/2017

对于以下示例数据集,我需要在第一次购买(CustomerStatus=Purchased)后删除客户(CustomerID)的所有行。有些客户不购买该产品,我仍然希望保留对这些客户的任何观察。日期变量必须保留

我在删除组中的行时遇到困难。原始数据并没有像这样很好地分组,我试图简化我遇到的问题。感谢您的帮助

我提供了一个示例数据集:

SalesPerson  CustomerID  Date       CustomerStatus
Amanda       2000       1/5/2017    Intro
Amanda       2000       1/6/2017    Email
Amanda       2000       1/15/2017   PhoneCall
Amanda       2000       2/15/2017   Purchased
Amanda       2001       1/3/2017    Intro
Amanda       2001       1/4/2017    Email
Amanda       2001       1/12/2017   PhoneCall
Amanda       2001       1/15/2017   Conference
Amanda       2001       2/4/2017    Purchased
Amanda       2001       3/17/2017   Meeting
Amanda       2001       3/20/2017   Email
Kyle         2002       1/19/2017   Intro
Kyle         2002       1/20/2017   Email
Kyle         2002       1/21/2017   PhoneCall
Sharon       2006       1/8/2017    Intro
Sharon       2006       1/10/2017   Meeting
Sharon       2006       1/19/2017   Purchased
Sharon       2006       1/30/2017   Conference
Sharon       2006       2/10/2017   Purchased
输出应如下所示:

SalesPerson  CustomerID  Date       CustomerStatus
Amanda       2000       1/5/2017    Intro
Amanda       2000       1/6/2017    Email
Amanda       2000       1/15/2017   PhoneCall
Amanda       2000       2/15/2017   Purchased
Amanda       2001       1/3/2017    Intro
Amanda       2001       1/4/2017    Email
Amanda       2001       1/12/2017   PhoneCall
Amanda       2001       1/15/2017   Conference
Amanda       2001       2/4/2017    Purchased
Kyle         2002       1/19/2017   Intro
Kyle         2002       1/20/2017   Email
Kyle         2002       1/21/2017   PhoneCall
Sharon       2006       1/8/2017    Intro
Sharon       2006       1/10/2017   Meeting
Sharon       2006       1/19/2017   Purchased

我们可以按“SalesPerson”、“CustomerID”分组,创建一个逻辑索引以过滤

library(dplyr)
df1 %>%
     group_by(SalesPerson, CustomerID) %>% 
     filter(cumsum(lag(CustomerStatus == "Purchased", default = FALSE))<1)
# A tibble: 15 x 4
# Groups:   SalesPerson, CustomerID [4]
#   SalesPerson CustomerID      Date CustomerStatus
#         <chr>      <int>     <chr>          <chr>
# 1      Amanda       2000  1/5/2017          Intro
# 2      Amanda       2000  1/6/2017          Email
# 3      Amanda       2000 1/15/2017      PhoneCall
# 4      Amanda       2000 2/15/2017      Purchased
# 5      Amanda       2001  1/3/2017          Intro
# 6      Amanda       2001  1/4/2017          Email
# 7      Amanda       2001 1/12/2017      PhoneCall
# 8      Amanda       2001 1/15/2017     Conference
# 9      Amanda       2001  2/4/2017      Purchased
#10        Kyle       2002 1/19/2017          Intro
#11        Kyle       2002 1/20/2017          Email
#12        Kyle       2002 1/21/2017      PhoneCall
#13      Sharon       2006  1/8/2017          Intro
#14      Sharon       2006 1/10/2017        Meeting
#15      Sharon       2006 1/19/2017      Purchased
库(dplyr)
df1%>%
分组依据(销售人员、客户ID)%>%

过滤器(累计值(滞后(CustomerStatus==“已购买”,默认值=FALSE))谢谢。这个解决方案很有效。我不熟悉lag和cumsum函数,但我现在了解它们的工作原理。您能告诉我您是如何解决这个问题的吗?我甚至没有考虑逻辑索引。@user8215919在分组操作后,创建了逻辑索引
CustomerStatus==“已购买”
,然后取它的
滞后
,以便将真值的位置更改到下一个位置。然后进行累积和
过滤
小于1的值。如果您理解有困难,请将代码分成几个部分,并使用
mutate(id=lag(CustomerStatus==“Purchased”,default=FALSE))检查它。