如何根据R中的其他列对特定列中的值进行分类
我有一个包含以下细节的数据框如何根据R中的其他列对特定列中的值进行分类,r,dataframe,dplyr,tidyverse,R,Dataframe,Dplyr,Tidyverse,我有一个包含以下细节的数据框 BatchId Datetime Purchase_Status Current_Progress PRT-10011 2021-03-01 15:18:24 Sold Pending PRT-10012 2021-03-12 18:11:04 Sold PRT-10013 2021-03-15 21:13:45
BatchId Datetime Purchase_Status Current_Progress
PRT-10011 2021-03-01 15:18:24 Sold Pending
PRT-10012 2021-03-12 18:11:04 Sold
PRT-10013 2021-03-15 21:13:45 Open
PRT-10014 Open
PRT-10015 2021-03-18 10:06:36 Return Pending
PRT-10016 Process Pending
Dput(df)
我需要在以下条件下再添加一列作为Category
- 如果
已售出且Purchase\u Status
不为空,则Na或null将Purchase\u Status值和Current\u Progress值连接为“-”Current\u Progress
- 如果
已售出且Purchase\u Status
为空,则Na或null将Purchase\u Status值与文本“Not Updated”(未更新)连接为“-”Current\u Progress
- 如果
处于打开状态且Datetime不为空,则Na或null将Purchase\u Status值与文本“Order Placed”通过“-”连接起来Purchase\u Status
- 如果
处于打开状态且Datetime为空,则Na或null将Purchase\u Status值与文本“Order Not Placed”(未下订单)通过“-”连接起来Purchase\u Status
- 对于除“已售出”和“未结”之外的其余
,将其作为其他,并根据Datetime列中值的可用性,将其与文本“未下单”或“下单”连接起来Purchase\u状态
BatchId Datetime Purchase_Status Current_Progress Category
PRT-10011 2021-03-01 15:18:24 Sold Pending Sold - Pending
PRT-10012 2021-03-12 18:11:04 Sold Sold - Not Updated
PRT-10013 2021-03-15 21:13:45 Open Open - Order Placed
PRT-10014 Open Open - Order Not Placed
PRT-10015 2021-03-18 10:06:36 Return Pending Other - Order Placed
PRT-10016 Process Pending Other - Order Not Placed
如注释所述,您应该能够在执行此操作时使用
dplyr::case\u。你的电话应该是这样的
df %>%
dplyr::mutate(Category = dplyr::case_when(
Purchase_Status == "Sold" & !is.na(Current_Progess) ~ paste(Purchase_Status, Current_Progess, sep = "-"),
# OTHER CASES HERE)
)
添加其他案例并使用~
将其映射到值
df %>%
replace_na(list(Current_Progress = "")) %>% # simplifies below to test for just ""
# instead of "" and NA
mutate(Category = case_when(
Purchase_Status == "Sold" & Current_Progress != "" ~ paste0(Purchase_Status, "-", Current_Progress),
Purchase_Status == "Sold" ~ paste0(Purchase_Status, "-Not Updated"),
Purchase_Status == "Open" & Current_Progress != "" ~ paste0(Purchase_Status, "-Order Placed"),
Purchase_Status == "Open" ~ paste0(Purchase_Status, "-Order Not Placed"),
is.na(Datetime) ~ "Order Not Placed",
TRUE ~ "Order Placed")
)
dplyr::case\u当
按顺序测试每个条件时,因此如果前面的情况都不匹配,则最后一步不需要测试——我们可以将其视为真
BatchId Datetime Purchase_Status Current_Progress Category
12426 PRT-10011 2019-05-20 10:46:49 Sold Pending Sold-Pending
21988 PRT-10012 2020-09-24 12:28:10 Sold Sold-Not Updated
22555 PRT-10013 2019-05-31 06:12:12 Open Open-Order Not Placed
12486 PRT-10014 <NA> Open Open-Order Not Placed
15432 PRT-10015 2019-09-26 11:36:58 Return Pending Order Placed
16934 PRT-10016 <NA> Process Pending Order Not Placed
BatchId日期时间采购\u状态当前\u进度类别
12426 PRT-10011 2019-05-20 10:46:49待售待售
21988 PRT-10012 2020-09-24 12:28:10售出未更新
22555 PRT-10013 2019-05-31 06:12:12未结订单未下
12486 PRT-10014未结订单未下
15432 PRT-10015 2019-09-26 11:36:58退货待决订单已下
16934 PRT-10016流程待定订单未下达
dplyr::case\u的伟大用例。请您以我们可以直接加载的方式包含数据,例如,在您的问题中包含dput(您的数据帧)
?@JonSpring-我已经更新了dput
。
BatchId Datetime Purchase_Status Current_Progress Category
12426 PRT-10011 2019-05-20 10:46:49 Sold Pending Sold-Pending
21988 PRT-10012 2020-09-24 12:28:10 Sold Sold-Not Updated
22555 PRT-10013 2019-05-31 06:12:12 Open Open-Order Not Placed
12486 PRT-10014 <NA> Open Open-Order Not Placed
15432 PRT-10015 2019-09-26 11:36:58 Return Pending Order Placed
16934 PRT-10016 <NA> Process Pending Order Not Placed