根据tidyverse中以前的观察结果,有条件地重命名组中的元素
My dataframe包含以下元素: 1) 每个根据tidyverse中以前的观察结果,有条件地重命名组中的元素,r,dplyr,tidyverse,R,Dplyr,Tidyverse,My dataframe包含以下元素: 1) 每个user\u id可以有几个order\u id 2) 每个order\u id可以有两种类型:monthly订单的周期为1个月(30天)或3个月(90天)。用户在其有生之年可以多次从一种计划类型切换到另一种类型 library(tidyverse) df_input <- tibble::tribble( ~user_id, ~order_id, ~date, ~plan_type,
user\u id
可以有几个order\u id
2) 每个order\u id
可以有两种类型:monthly
订单的周期为1个月(30天)或3个月(90天)。用户在其有生之年可以多次从一种计划类型切换到另一种类型
library(tidyverse)
df_input <- tibble::tribble(
~user_id, ~order_id, ~date, ~plan_type,
1, 123, "01-01-2020", "monthly",
1, 124, "01-31-2020", "monthly",
1, 125, "03-01-2020", "3-months",
2, 126, "01-11-2019", "3-months",
2, 127, "10-13-2018", "monthly",
2, 128, "11-12-2018", "monthly",
3, 129, "01-10-2019", "3-months",
3, 130, "04-10-2019", "3-months",
3, 131, "07-09-2019", "3-months",
4, 132, "01-02-2020", "monthly",
4, 133, "02-01-2020", "monthly"
)
> df_input
# A tibble: 11 x 4
user_id order_id date plan_type
<dbl> <dbl> <chr> <chr>
1 1 123 01-01-2020 monthly
2 1 124 01-31-2020 monthly
3 1 125 03-01-2020 3-months
4 2 126 01-11-2019 3-months
5 2 127 10-13-2018 monthly
6 2 128 11-12-2018 monthly
7 3 129 01-10-2019 3-months
8 3 130 04-10-2019 3-months
9 3 131 07-09-2019 3-months
10 4 132 01-02-2020 monthly
11 4 133 02-01-2020 monthly
2) 我想创建一个behavior\u type
列。我想将每个订单id
标记为升级
、降级
或无
。
如果用户从每月
到3个月
,订单将是升级
,如果从3个月
到每月
降级
,则为无
我的最终数据帧必须如下所示:
df_final <- tibble::tribble(
~order_id, ~behavior_type, ~order_type,
123, "none", "acquisition",
124, "none", "repeat",
125, "upgrade", "repeat",
126, "none", "acquisition",
127, "downgrade", "repeat",
128, "none", "repeat",
129, "none", "acquisition",
130, "none", "repeat",
131, "none", "repeat",
132, "none", "acquisition",
133, "none", "repeat"
)
> df_final
# A tibble: 11 x 3
order_id behavior_type order_type
<dbl> <chr> <chr>
1 123 none acquisition
2 124 none repeat
3 125 upgrade repeat
4 126 none acquisition
5 127 downgrade repeat
6 128 none repeat
7 129 none acquisition
8 130 none repeat
9 131 none repeat
10 132 none acquisition
11 133 none repeat
df_final df_final
#一个tibble:11x3
订单\u id行为\u类型订单\u类型
1 123无收购
2124无重复
3 125升级重复
4 126无收购
5 127降级重复
6128无重复
7 129无收购
8130无重复
9 131无重复
10132无收购
11 133无重复
这第二步有什么帮助吗
library(dplyr)
#just repeating your first step
df_interim <- df_input %>%
group_by(user_id) %>%
mutate(rank = row_number(date)) %>%
mutate(order_type = ifelse(rank == '1','acquisition','repeats'))
df_interim %>%
mutate(behavior_type = case_when(plan_type == "monthly" & lag(plan_type) == "3-months" ~ "downgrade",
plan_type == "3-months" & lag(plan_type) == "monthly" ~ "upgrade",
TRUE ~ "none")) %>%
ungroup() %>%
select(order_id, behavior_type, order_type)
# A tibble: 11 x 3
order_id behavior_type order_type
<dbl> <chr> <chr>
1 123 none acquisition
2 124 none repeats
3 125 upgrade repeats
4 126 none acquisition
5 127 downgrade repeats
6 128 none repeats
7 129 none acquisition
8 130 none repeats
9 131 none repeats
10 132 none acquisition
11 133 none repeats
库(dplyr)
#只是重复你的第一步
df_中期%
分组依据(用户id)%>%
变异(排名=行号(日期))%>%
变异(顺序类型=ifelse(秩='1','acquisition','repeats'))
df_中期%>%
当(计划类型=“每月”和延迟(计划类型=“3个月”~“降级”)时,改变(行为类型=案例类型),
计划类型==“3个月”&滞后(计划类型==“每月”~“升级”,
真~“无”))%>%
解组()%>%
选择(订单id、行为类型、订单类型)
#一个tibble:11x3
订单\u id行为\u类型订单\u类型
1 123无收购
2124无重复
3 125次升级重复
4 126无收购
5 127次降级重复
6128无重复
7 129无收购
8130无重复
9 131无重复
10132无收购
11 133无重复
订单类型部分可以更简单,如下所示:
group_by(user_id) %>%
mutate(order_type = if_else(as.Date(date, "%m-%d-%Y") == min(as.Date(date, "%m-%d-%Y")), "acquisition", "repeat"))
库(dplyr)
df_输入%>%
分组依据(用户id)%>%
变异(排名=行号(日期))%>%
变异(顺序类型=if-else(秩=“1”,“获取”,“重复”),
行为类型=如果其他(排名=“1”,“无”,
如果else(计划类型!=滞后(计划类型)&计划类型==“3个月”,“升级”,
如果else(计划类型!=滞后(计划类型)&计划类型==“每月”、“降级”,
“无”)
#一个tibble:11x7
#组:用户_id[4]
用户id订单id日期计划类型排名顺序类型行为类型
1 12301-01-2020每月1次采集无
2 1 124 01-31-2020每月2次重复无
3 1 125 03-01-2020 3个月3次重复升级
4 2 126 01-11-2019 3个月1次收购无
52127 2018年10月13日每月2次重复降级
2018年11月12日每月3次重复无
7 3 129 01-10-2019 3个月1次收购无
8 3 130 04-10-2019 3个月2次重复无
9 3 131 07-09-2019 3个月3次重复无
10 4 132 01-02-2020每月1次采集无
11 4 133 02-01-2020每月2次重复无
group_by(user_id) %>%
mutate(order_type = if_else(as.Date(date, "%m-%d-%Y") == min(as.Date(date, "%m-%d-%Y")), "acquisition", "repeat"))
library(dplyr)
df_input %>%
group_by(user_id) %>%
mutate(rank = row_number(date)) %>%
mutate(order_type = if_else(rank == "1","acquisition","repeats"),
behavior_type = if_else(rank == "1", "none",
if_else(plan_type != lag(plan_type) & plan_type == "3-months", "upgrade",
if_else(plan_type != lag(plan_type) & plan_type == "monthly", "downgrade",
"none"))))
# A tibble: 11 x 7
# Groups: user_id [4]
user_id order_id date plan_type rank order_type behavior_type
<dbl> <dbl> <chr> <chr> <int> <chr> <chr>
1 1 123 01-01-2020 monthly 1 acquisition none
2 1 124 01-31-2020 monthly 2 repeats none
3 1 125 03-01-2020 3-months 3 repeats upgrade
4 2 126 01-11-2019 3-months 1 acquisition none
5 2 127 10-13-2018 monthly 2 repeats downgrade
6 2 128 11-12-2018 monthly 3 repeats none
7 3 129 01-10-2019 3-months 1 acquisition none
8 3 130 04-10-2019 3-months 2 repeats none
9 3 131 07-09-2019 3-months 3 repeats none
10 4 132 01-02-2020 monthly 1 acquisition none
11 4 133 02-01-2020 monthly 2 repeats none