R 基于正则表达式拆分data.table列
我有一个data.table,它有三列。第二列,我想基于正则表达式进行拆分,因此我将得到四列。当我这样做的时候,我总是得到奇怪的反应,我希望得到一些反馈。以下是数据:R 基于正则表达式拆分data.table列,r,regex,data.table,R,Regex,Data.table,我有一个data.table,它有三列。第二列,我想基于正则表达式进行拆分,因此我将得到四列。当我这样做的时候,我总是得到奇怪的反应,我希望得到一些反馈。以下是数据: category label count 1 Navigation Product || Green 2 2 Navigation Survey || Green 5 3 Navigation Product |
category label count
1 Navigation Product || Green 2
2 Navigation Survey || Green 5
3 Navigation Product || Red 10
4 Navigation Survey || Red 10
我想在
|
处拆分标签部分,并使用数据创建两个新列Type
和Color
。表
,您可以执行以下操作:
dt[, c("type", "color") := tstrsplit(label, " || ", fixed = TRUE)]
category label count type color
1: Nagivation Product || Green 2 Product Green
2: Navigation Survey || Green 5 Survey Green
样本数据:
dt <- data.table(category = c("Nagivation", "Navigation"),
label = c("Product || Green", "Survey || Green"),
count = c(2, 5))
dt
数据
d = structure(list(category = c("Navigation", "Navigation", "Navigation",
"Navigation"), label = c("Product || Green", "Survey || Green",
"Product || Red", "Survey || Red"), count = c(2L, 5L, 10L, 10L
)), class = "data.frame", row.names = c(NA, -4L))
我们可以使用tidyr::separate
:
库(data.table)
dt1类别类型颜色计数
#>1:导航产品绿色2
#>2:导航测量绿色5
#>3:导航产品红色10
#>4:导航测量红色10
tidyr::separate
这是最直接、最有效的利用方式。用支票标记这个,因为它工作起来非常容易。
d = structure(list(category = c("Navigation", "Navigation", "Navigation",
"Navigation"), label = c("Product || Green", "Survey || Green",
"Product || Red", "Survey || Red"), count = c(2L, 5L, 10L, 10L
)), class = "data.frame", row.names = c(NA, -4L))