在dplyr中按组创建具有最新日期的新变量

在dplyr中按组创建具有最新日期的新变量,r,dplyr,mutate,R,Dplyr,Mutate,我有一个数据框,希望通过Id创建2个新变量。 首先,我需要按Id分组,并通过createdDate获取最新日期,然后我需要根据最新日期再次获取Lead\u DataSource\uu c 这是我的数据帧的尾部 tail(df) Id CreatedDate Lead_DataSource__c StageName 0011000001XW3YZAA1 2020-07-17 Walk in Quotation 001100000

我有一个数据框,希望通过
Id
创建2个新变量。 首先,我需要按
Id
分组,并通过
createdDate
获取最新日期,然后我需要根据最新日期再次获取
Lead\u DataSource\uu c

这是我的数据帧的尾部

tail(df)


Id                  CreatedDate Lead_DataSource__c   StageName
0011000001XW3YZAA1  2020-07-17            Walk in   Quotation
0011000001XW3Z8AAL  2020-07-17            Walk in   Quotation
0011000001XW3zHAAT  2020-07-17            Walk in    Assigned
0011000001XW3zlAAD  2020-07-17            Walk in   Quotation
0011000001XW3zvAAD  2020-07-17            Walk in Closed Lost
0011000001XW3zvAAD  2020-07-17            Website Closed Lost
以下是我的代码:

df_new<-df %>% group_by(Id)%>%
 mutate(numberoflead=length(Id)) %>% #number of lead
  mutate(lastcreateddateoflead=max(CreatedDate)) %>%#last date of lead
  mutate(lasttouch =max(CreatedDate)[Lead_DataSource__c])%>% #last touch
df_新建%group_by(Id)%%>%
变异(numberoflead=长度(Id))%>%#lead数
变异(lastcreateddateoflead=max(CreatedDate))%>%#lead的最后日期
mutate(lasttouch=max(CreatedDate)[Lead_DataSource_uuuc])%>%#last touch
当我运行这些代码时,我没有得到任何错误,它似乎适用于
numberofleads
lastcreateddateoflead
,但它似乎不适用于
lasttouch


有谁能帮我解释一下我在这里遗漏了什么吗?

你的问题是你在使用
mutate
,而你应该使用
summary
。然后,您需要加入原始的
df
以获得
lasttouch
。如果在联接中添加
select
,则只需获得
lasttouch
列,无需重命名或选择任何内容

library(dplyr)

df %>%
  group_by(Id) %>%
  summarize(numberoflead = n(),
            lastcreateddateoflead=max(CreatedDate)) %>%
  inner_join(df %>% 
               select(Id, CreatedDate, lasttouch = Lead_DataSource__c),
             by = c("Id" = "Id", "lastcreateddateoflead" = "CreatedDate"))
            
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 6 x 4
  Id                 numberoflead lastcreateddateoflead lasttouch
  <chr>                     <int> <date>                <chr>    
1 0011000001XW3YZAA1            1 2020-07-17            Walk in  
2 0011000001XW3Z8AAL            1 2020-07-17            Walk in  
3 0011000001XW3zHAAT            1 2020-07-17            Walk in  
4 0011000001XW3zlAAD            1 2020-07-17            Walk in  
5 0011000001XW3zvAAD            2 2020-07-17            Walk in  
6 0011000001XW3zvAAD            2 2020-07-17            Website  
库(dplyr)
df%>%
分组依据(Id)%>%
汇总(numberoflead=n(),
lastcreateddateoflead=max(CreatedDate))%>%
内部联接(df%>%
选择(Id,CreatedDate,lasttouch=Lead\u DataSource\uuu c),
by=c(“Id”=“Id”,“lastcreateddateoflead”=“CreatedDate”))
`summary()`解组输出(用`.groups`参数重写)
#一个tibble:6x4
Id号Lead lastcreateddateoflead lasttouch
1 0011000001XW3YZAA1 1 2020-07-17步入式
2 0011000001XW3Z8AAL 1 2020-07-17步入式
3 0011000001XW3zHAAT 1 2020-07-17步入式
4 0011000001XW3zlAAD 1 2020-07-17步入式
5 0011000001XW3zvAAD 2 2020-07-17步入式
6 0011000001XW3zvAAD 2 2020-07-17网站
如果您想保留所有行(而不是每个Id只保留一个摘要),请使用mutate而不是my summary

df %>%
  group_by(Id) %>%
  mutate(numberoflead = n(),
            lastcreateddateoflead=max(CreatedDate)) %>%
  inner_join(df %>% 
               select(Id, CreatedDate, lasttouch = Lead_DataSource__c),
             by = c("Id" = "Id", "lastcreateddateoflead" = "CreatedDate"))

# A tibble: 8 x 7
# Groups:   Id [5]
  Id             CreatedDate Lead_DataSource_~ StageName  numberoflead lastcreateddateofl~ lasttouch
  <chr>          <date>      <chr>             <chr>             <int> <date>              <chr>    
1 0011000001XW3~ 2020-07-17  Walk in           Quotation             1 2020-07-17          Walk in  
2 0011000001XW3~ 2020-07-17  Walk in           Quotation             1 2020-07-17          Walk in  
3 0011000001XW3~ 2020-07-17  Walk in           Assigned              1 2020-07-17          Walk in  
4 0011000001XW3~ 2020-07-17  Walk in           Quotation             1 2020-07-17          Walk in  
5 0011000001XW3~ 2020-07-17  Walk in           Closed Lo~            2 2020-07-17          Walk in  
6 0011000001XW3~ 2020-07-17  Walk in           Closed Lo~            2 2020-07-17          Website  
7 0011000001XW3~ 2020-07-17  Website           Closed Lo~            2 2020-07-17          Walk in  
8 0011000001XW3~ 2020-07-17  Website           Closed Lo~            2 2020-07-17          Website 
df%>%
分组依据(Id)%>%
突变(numberoflead=n(),
lastcreateddateoflead=max(CreatedDate))%>%
内部联接(df%>%
选择(Id,CreatedDate,lasttouch=Lead\u DataSource\uuu c),
by=c(“Id”=“Id”,“lastcreateddateoflead”=“CreatedDate”))
#一个tibble:8x7
#组别:Id[5]
Id CreatedDate Lead\u数据源~StageName numberoflead lastcreateddateofl~lasttouch
1 0011000001XW3~2020-07-17进场报价单1 2020-07-17进场
2 0011000001XW3~2020-07-17进场报价单1 2020-07-17进场
3 0011000001XW3~2020-07-17预约1 2020-07-17预约
4 0011000001XW3~2020-07-17进场报价单1 2020-07-17进场
5 0011000001XW3~2020-07-17步入式封闭Lo~2 2020-07-17步入式
6 0011000001XW3~2020-07-17走进封闭式Lo~2 2020-07-17网站
7 0011000001XW3~2020-07-17网站关闭2 2020-07-17走进
8 0011000001XW3~2020-07-17网站关闭2 2020-07-17网站

您的问题是,当您应该使用
摘要时,您正在使用
变异
。然后,您需要加入原始的
df
以获得
lasttouch
。如果在联接中添加
select
,则只需获得
lasttouch
列,无需重命名或选择任何内容

library(dplyr)

df %>%
  group_by(Id) %>%
  summarize(numberoflead = n(),
            lastcreateddateoflead=max(CreatedDate)) %>%
  inner_join(df %>% 
               select(Id, CreatedDate, lasttouch = Lead_DataSource__c),
             by = c("Id" = "Id", "lastcreateddateoflead" = "CreatedDate"))
            
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 6 x 4
  Id                 numberoflead lastcreateddateoflead lasttouch
  <chr>                     <int> <date>                <chr>    
1 0011000001XW3YZAA1            1 2020-07-17            Walk in  
2 0011000001XW3Z8AAL            1 2020-07-17            Walk in  
3 0011000001XW3zHAAT            1 2020-07-17            Walk in  
4 0011000001XW3zlAAD            1 2020-07-17            Walk in  
5 0011000001XW3zvAAD            2 2020-07-17            Walk in  
6 0011000001XW3zvAAD            2 2020-07-17            Website  
库(dplyr)
df%>%
分组依据(Id)%>%
汇总(numberoflead=n(),
lastcreateddateoflead=max(CreatedDate))%>%
内部联接(df%>%
选择(Id,CreatedDate,lasttouch=Lead\u DataSource\uuu c),
by=c(“Id”=“Id”,“lastcreateddateoflead”=“CreatedDate”))
`summary()`解组输出(用`.groups`参数重写)
#一个tibble:6x4
Id号Lead lastcreateddateoflead lasttouch
1 0011000001XW3YZAA1 1 2020-07-17步入式
2 0011000001XW3Z8AAL 1 2020-07-17步入式
3 0011000001XW3zHAAT 1 2020-07-17步入式
4 0011000001XW3zlAAD 1 2020-07-17步入式
5 0011000001XW3zvAAD 2 2020-07-17步入式
6 0011000001XW3zvAAD 2 2020-07-17网站
如果您想保留所有行(而不是每个Id只保留一个摘要),请使用mutate而不是my summary

df %>%
  group_by(Id) %>%
  mutate(numberoflead = n(),
            lastcreateddateoflead=max(CreatedDate)) %>%
  inner_join(df %>% 
               select(Id, CreatedDate, lasttouch = Lead_DataSource__c),
             by = c("Id" = "Id", "lastcreateddateoflead" = "CreatedDate"))

# A tibble: 8 x 7
# Groups:   Id [5]
  Id             CreatedDate Lead_DataSource_~ StageName  numberoflead lastcreateddateofl~ lasttouch
  <chr>          <date>      <chr>             <chr>             <int> <date>              <chr>    
1 0011000001XW3~ 2020-07-17  Walk in           Quotation             1 2020-07-17          Walk in  
2 0011000001XW3~ 2020-07-17  Walk in           Quotation             1 2020-07-17          Walk in  
3 0011000001XW3~ 2020-07-17  Walk in           Assigned              1 2020-07-17          Walk in  
4 0011000001XW3~ 2020-07-17  Walk in           Quotation             1 2020-07-17          Walk in  
5 0011000001XW3~ 2020-07-17  Walk in           Closed Lo~            2 2020-07-17          Walk in  
6 0011000001XW3~ 2020-07-17  Walk in           Closed Lo~            2 2020-07-17          Website  
7 0011000001XW3~ 2020-07-17  Website           Closed Lo~            2 2020-07-17          Walk in  
8 0011000001XW3~ 2020-07-17  Website           Closed Lo~            2 2020-07-17          Website 
df%>%
分组依据(Id)%>%
突变(numberoflead=n(),
lastcreateddateoflead=max(CreatedDate))%>%
内部联接(df%>%
选择(Id,CreatedDate,lasttouch=Lead\u DataSource\uuu c),
by=c(“Id”=“Id”,“lastcreateddateoflead”=“CreatedDate”))
#一个tibble:8x7
#组别:Id[5]
Id CreatedDate Lead\u数据源~StageName numberoflead lastcreateddateofl~lasttouch
1 0011000001XW3~2020-07-17进场报价单1 2020-07-17进场
2 0011000001XW3~2020-07-17进场报价单1 2020-07-17进场
3 0011000001XW3~2020-07-17预约1 2020-07-17预约
4 0011000001XW3~2020-07-17进场报价单1 2020-07-17进场
5 0011000001XW3~2020-07-17步入式封闭Lo~2 2020-07-17步入式
6 0011000001XW3~2020-07-17步入式封闭Lo~2 202