在R中满足条件后筛选组中的后续行
对于以下示例数据集,我需要在第一次购买(CustomerStatus=Purchased)后删除客户(CustomerID)的所有行。有些客户不购买该产品,我仍然希望保留对这些客户的任何观察。日期变量必须保留 我在删除组中的行时遇到困难。原始数据并没有像这样很好地分组,我试图简化我遇到的问题。感谢您的帮助 我提供了一个示例数据集:在R中满足条件后筛选组中的后续行,r,filter,subset,R,Filter,Subset,对于以下示例数据集,我需要在第一次购买(CustomerStatus=Purchased)后删除客户(CustomerID)的所有行。有些客户不购买该产品,我仍然希望保留对这些客户的任何观察。日期变量必须保留 我在删除组中的行时遇到困难。原始数据并没有像这样很好地分组,我试图简化我遇到的问题。感谢您的帮助 我提供了一个示例数据集: SalesPerson CustomerID Date CustomerStatus Amanda 2000 1/5/2017
SalesPerson CustomerID Date CustomerStatus
Amanda 2000 1/5/2017 Intro
Amanda 2000 1/6/2017 Email
Amanda 2000 1/15/2017 PhoneCall
Amanda 2000 2/15/2017 Purchased
Amanda 2001 1/3/2017 Intro
Amanda 2001 1/4/2017 Email
Amanda 2001 1/12/2017 PhoneCall
Amanda 2001 1/15/2017 Conference
Amanda 2001 2/4/2017 Purchased
Amanda 2001 3/17/2017 Meeting
Amanda 2001 3/20/2017 Email
Kyle 2002 1/19/2017 Intro
Kyle 2002 1/20/2017 Email
Kyle 2002 1/21/2017 PhoneCall
Sharon 2006 1/8/2017 Intro
Sharon 2006 1/10/2017 Meeting
Sharon 2006 1/19/2017 Purchased
Sharon 2006 1/30/2017 Conference
Sharon 2006 2/10/2017 Purchased
输出应如下所示:
SalesPerson CustomerID Date CustomerStatus
Amanda 2000 1/5/2017 Intro
Amanda 2000 1/6/2017 Email
Amanda 2000 1/15/2017 PhoneCall
Amanda 2000 2/15/2017 Purchased
Amanda 2001 1/3/2017 Intro
Amanda 2001 1/4/2017 Email
Amanda 2001 1/12/2017 PhoneCall
Amanda 2001 1/15/2017 Conference
Amanda 2001 2/4/2017 Purchased
Kyle 2002 1/19/2017 Intro
Kyle 2002 1/20/2017 Email
Kyle 2002 1/21/2017 PhoneCall
Sharon 2006 1/8/2017 Intro
Sharon 2006 1/10/2017 Meeting
Sharon 2006 1/19/2017 Purchased
我们可以按“SalesPerson”、“CustomerID”分组,创建一个逻辑索引以过滤
library(dplyr)
df1 %>%
group_by(SalesPerson, CustomerID) %>%
filter(cumsum(lag(CustomerStatus == "Purchased", default = FALSE))<1)
# A tibble: 15 x 4
# Groups: SalesPerson, CustomerID [4]
# SalesPerson CustomerID Date CustomerStatus
# <chr> <int> <chr> <chr>
# 1 Amanda 2000 1/5/2017 Intro
# 2 Amanda 2000 1/6/2017 Email
# 3 Amanda 2000 1/15/2017 PhoneCall
# 4 Amanda 2000 2/15/2017 Purchased
# 5 Amanda 2001 1/3/2017 Intro
# 6 Amanda 2001 1/4/2017 Email
# 7 Amanda 2001 1/12/2017 PhoneCall
# 8 Amanda 2001 1/15/2017 Conference
# 9 Amanda 2001 2/4/2017 Purchased
#10 Kyle 2002 1/19/2017 Intro
#11 Kyle 2002 1/20/2017 Email
#12 Kyle 2002 1/21/2017 PhoneCall
#13 Sharon 2006 1/8/2017 Intro
#14 Sharon 2006 1/10/2017 Meeting
#15 Sharon 2006 1/19/2017 Purchased
库(dplyr)
df1%>%
分组依据(销售人员、客户ID)%>%
过滤器(累计值(滞后(CustomerStatus==“已购买”,默认值=FALSE))谢谢。这个解决方案很有效。我不熟悉lag和cumsum函数,但我现在了解它们的工作原理。您能告诉我您是如何解决这个问题的吗?我甚至没有考虑逻辑索引。@user8215919在分组操作后,创建了逻辑索引CustomerStatus==“已购买”
,然后取它的滞后
,以便将真值的位置更改到下一个位置。然后进行累积和过滤小于1的值。如果您理解有困难,请将代码分成几个部分,并使用mutate(id=lag(CustomerStatus==“Purchased”,default=FALSE))检查它。