在R中满足条件后筛选组中的后续行_R_Filter_Subset

在R中满足条件后筛选组中的后续行

r filter

在R中满足条件后筛选组中的后续行,r,filter,subset,R,Filter,Subset,对于以下示例数据集，我需要在第一次购买（CustomerStatus=Purchased）后删除客户（CustomerID）的所有行。有些客户不购买该产品，我仍然希望保留对这些客户的任何观察。日期变量必须保留我在删除组中的行时遇到困难。原始数据并没有像这样很好地分组，我试图简化我遇到的问题。感谢您的帮助我提供了一个示例数据集： SalesPerson CustomerID Date CustomerStatus Amanda 2000 1/5/2017

对于以下示例数据集，我需要在第一次购买（CustomerStatus=Purchased）后删除客户（CustomerID）的所有行。有些客户不购买该产品，我仍然希望保留对这些客户的任何观察。日期变量必须保留

我在删除组中的行时遇到困难。原始数据并没有像这样很好地分组，我试图简化我遇到的问题。感谢您的帮助

我提供了一个示例数据集：

SalesPerson  CustomerID  Date       CustomerStatus
Amanda       2000       1/5/2017    Intro
Amanda       2000       1/6/2017    Email
Amanda       2000       1/15/2017   PhoneCall
Amanda       2000       2/15/2017   Purchased
Amanda       2001       1/3/2017    Intro
Amanda       2001       1/4/2017    Email
Amanda       2001       1/12/2017   PhoneCall
Amanda       2001       1/15/2017   Conference
Amanda       2001       2/4/2017    Purchased
Amanda       2001       3/17/2017   Meeting
Amanda       2001       3/20/2017   Email
Kyle         2002       1/19/2017   Intro
Kyle         2002       1/20/2017   Email
Kyle         2002       1/21/2017   PhoneCall
Sharon       2006       1/8/2017    Intro
Sharon       2006       1/10/2017   Meeting
Sharon       2006       1/19/2017   Purchased
Sharon       2006       1/30/2017   Conference
Sharon       2006       2/10/2017   Purchased

输出应如下所示：

SalesPerson  CustomerID  Date       CustomerStatus
Amanda       2000       1/5/2017    Intro
Amanda       2000       1/6/2017    Email
Amanda       2000       1/15/2017   PhoneCall
Amanda       2000       2/15/2017   Purchased
Amanda       2001       1/3/2017    Intro
Amanda       2001       1/4/2017    Email
Amanda       2001       1/12/2017   PhoneCall
Amanda       2001       1/15/2017   Conference
Amanda       2001       2/4/2017    Purchased
Kyle         2002       1/19/2017   Intro
Kyle         2002       1/20/2017   Email
Kyle         2002       1/21/2017   PhoneCall
Sharon       2006       1/8/2017    Intro
Sharon       2006       1/10/2017   Meeting
Sharon       2006       1/19/2017   Purchased

我们可以按“SalesPerson”、“CustomerID”分组，创建一个逻辑索引以过滤


library(dplyr)
df1 %>%
     group_by(SalesPerson, CustomerID) %>% 
     filter(cumsum(lag(CustomerStatus == "Purchased", default = FALSE))<1)
# A tibble: 15 x 4
# Groups:   SalesPerson, CustomerID [4]
#   SalesPerson CustomerID      Date CustomerStatus
#         <chr>      <int>     <chr>          <chr>
# 1      Amanda       2000  1/5/2017          Intro
# 2      Amanda       2000  1/6/2017          Email
# 3      Amanda       2000 1/15/2017      PhoneCall
# 4      Amanda       2000 2/15/2017      Purchased
# 5      Amanda       2001  1/3/2017          Intro
# 6      Amanda       2001  1/4/2017          Email
# 7      Amanda       2001 1/12/2017      PhoneCall
# 8      Amanda       2001 1/15/2017     Conference
# 9      Amanda       2001  2/4/2017      Purchased
#10        Kyle       2002 1/19/2017          Intro
#11        Kyle       2002 1/20/2017          Email
#12        Kyle       2002 1/21/2017      PhoneCall
#13      Sharon       2006  1/8/2017          Intro
#14      Sharon       2006 1/10/2017        Meeting
#15      Sharon       2006 1/19/2017      Purchased

库（dplyr）
df1%>%
分组依据（销售人员、客户ID）%>%
过滤器（累计值（滞后（CustomerStatus==“已购买”，默认值=FALSE））谢谢。这个解决方案很有效。我不熟悉lag和cumsum函数，但我现在了解它们的工作原理。您能告诉我您是如何解决这个问题的吗？我甚至没有考虑逻辑索引。@user8215919在分组操作后，创建了逻辑索引CustomerStatus==“已购买”
，然后取它的滞后
，以便将真值的位置更改到下一个位置。然后进行累积和过滤小于1的值。如果您理解有困难，请将代码分成几个部分，并使用mutate（id=lag（CustomerStatus==“Purchased”，default=FALSE））检查它。