R 基于正则表达式拆分data.table列

R 基于正则表达式拆分data.table列,r,regex,data.table,R,Regex,Data.table,我有一个data.table,它有三列。第二列,我想基于正则表达式进行拆分,因此我将得到四列。当我这样做的时候,我总是得到奇怪的反应,我希望得到一些反馈。以下是数据: category label count 1 Navigation Product || Green 2 2 Navigation Survey || Green 5 3 Navigation Product |

我有一个data.table,它有三列。第二列,我想基于正则表达式进行拆分,因此我将得到四列。当我这样做的时候,我总是得到奇怪的反应,我希望得到一些反馈。以下是数据:

     category                 label     count
1  Navigation     Product || Green         2 
2  Navigation      Survey || Green         5
3  Navigation       Product || Red        10
4  Navigation        Survey || Red        10

我想在
|
处拆分标签部分,并使用
数据创建两个新列
Type
Color
。表
,您可以执行以下操作:

dt[, c("type", "color") := tstrsplit(label, " || ", fixed = TRUE)]

     category            label count    type color
1: Nagivation Product || Green     2 Product Green
2: Navigation  Survey || Green     5  Survey Green
样本数据:

dt <- data.table(category = c("Nagivation", "Navigation"),
                 label = c("Product || Green", "Survey || Green"),
                 count = c(2, 5))
dt
数据

d = structure(list(category = c("Navigation", "Navigation", "Navigation", 
"Navigation"), label = c("Product || Green", "Survey || Green", 
"Product || Red", "Survey || Red"), count = c(2L, 5L, 10L, 10L
)), class = "data.frame", row.names = c(NA, -4L))

我们可以使用
tidyr::separate

库(data.table)
dt1类别类型颜色计数
#>1:导航产品绿色2
#>2:导航测量绿色5
#>3:导航产品红色10
#>4:导航测量红色10

tidyr::separate
这是最直接、最有效的利用方式。用支票标记这个,因为它工作起来非常容易。
d = structure(list(category = c("Navigation", "Navigation", "Navigation", 
"Navigation"), label = c("Product || Green", "Survey || Green", 
"Product || Red", "Survey || Red"), count = c(2L, 5L, 10L, 10L
)), class = "data.frame", row.names = c(NA, -4L))