Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/73.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/templates/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Regex 如何向R中基于另一列中的字符串的data.table中添加列?_Regex_R_Parsing_Transform_Data.table - Fatal编程技术网

Regex 如何向R中基于另一列中的字符串的data.table中添加列?

Regex 如何向R中基于另一列中的字符串的data.table中添加列?,regex,r,parsing,transform,data.table,Regex,R,Parsing,Transform,Data.table,我想根据另一列中的字符串向data.table添加列。这是我的数据和我尝试过的方法: Params 1: { clientID : 459; time : 1386868908703; version : 6} 2: { clientID : 459; id : 52a9ea8b534b2b0b5000575f; time : 1386868824339; user : 459001} 3:

我想根据另一列中的字符串向data.table添加列。这是我的数据和我尝试过的方法:

Params 1: { clientID : 459; time : 1386868908703; version : 6} 2: { clientID : 459; id : 52a9ea8b534b2b0b5000575f; time : 1386868824339; user : 459001} 3: { clientID : 988; time : 1388939739771} 4: { clientID : 459; id : 52a9ec00b73cbf0b210057e9; time : 1386868810519; user : 459001} 5: { clientID : 459; time : 1388090530634} 我想解析“Params”列中的文本,并基于其中的文本创建新列。例如,我希望有一个名为“user”的新列,在Params字符串中只保存“user:”后面的数字。添加的列应如下所示:

Params user 1: { clientID : 459; time : 1386868908703; version : 6} NA 2: { clientID : 459; id : 52a9ea8b534b2b0b5000575f; time : 1386868824339; user : 459001} 459001 3: { clientID : 988; time : 1388939739771} NA 4: { clientID : 459; id : 52a9ec00b73cbf0b210057e9; time : 1386868810519; user : 459001} 459001 5: { clientID : 459; time : 1388090530634} 459001
我如何解决这个问题?谢谢

这里有一种使用正则表达式执行此任务的方法:

myparse <- function(searchterm, s) {
  res <- rep(NA_character_, length(s)) # NA vector
  idx <- grepl(searchterm, s) # index for strings including the search term
  pattern <- paste0(".*", searchterm, " : ([^;}]+)[;}].*") # regex pattern
  res[idx] <- sub(pattern, "\\1", s[idx]) # extract target string
  return(res)
}
对于没有
user
字段的行,新列包含
NA

DT[, user]
# [1] NA       "459001" NA       "459001" NA

我会使用一些外部解析器,例如:

library(yaml)

DT = data.frame(
    Params=c("{ clientID : 459;  time : 1386868908703;  version : 6}","{ clientID : 459;  id : 52a9ea8b534b2b0b5000575f;  time : 1386868824339;  user : 459001}","{ clientID : 988;  time : 1388939739771}","{ clientID : 459;  id : 52a9ec00b73cbf0b210057e9;  time : 1386868810519;  user : 459001}","{ clientID : 459;  time : 1388090530634}"), 
    stringsAsFactors=F
    )

conv.to.yaml <- function(x){
     gsub(';  ','\n',substr(x, 3, nchar(x)-1))
}

tmp <- lapply( DT$Params, function(x) yaml.load(conv.to.yaml(x)) )  

谢谢。对于我提供的数据来说,这很好。我需要如何调整正则表达式以允许字符串“{clientID:461;time:1386770861254;type:new;newUser:461002}”,其中包括“type:new”?@Miriam本例的结果是什么,
“type:new”
“new”
?列应命名为“type”,值为“new”(如user:@Miriam Try
DT[,type:=myparse(“type”,Params)]
。由于某种原因,如果您对我的字符串使用myparse函数,我不知道这不起作用:
>t myparse(“type”,t)[1]“{clientID:461;time:13866770861254;type:new;newUser:461002}”
返回的整个字符串与myparse(“time”)相反“,t)。知道原因是什么吗?
Error in data.table(list(Params = c("{ clientID : 459;  time : 1386868908703;  version : 6}",  : 
  argument 2 (nrow 2) cannot be recycled without remainder to match longest nrow (5)
myparse <- function(searchterm, s) {
  res <- rep(NA_character_, length(s)) # NA vector
  idx <- grepl(searchterm, s) # index for strings including the search term
  pattern <- paste0(".*", searchterm, " : ([^;}]+)[;}].*") # regex pattern
  res[idx] <- sub(pattern, "\\1", s[idx]) # extract target string
  return(res)
}
DT[, user := myparse("user", Params)]
DT[, user]
# [1] NA       "459001" NA       "459001" NA
library(yaml)

DT = data.frame(
    Params=c("{ clientID : 459;  time : 1386868908703;  version : 6}","{ clientID : 459;  id : 52a9ea8b534b2b0b5000575f;  time : 1386868824339;  user : 459001}","{ clientID : 988;  time : 1388939739771}","{ clientID : 459;  id : 52a9ec00b73cbf0b210057e9;  time : 1386868810519;  user : 459001}","{ clientID : 459;  time : 1388090530634}"), 
    stringsAsFactors=F
    )

conv.to.yaml <- function(x){
     gsub(';  ','\n',substr(x, 3, nchar(x)-1))
}

tmp <- lapply( DT$Params, function(x) yaml.load(conv.to.yaml(x)) )  
unames <- unique( unlist(sapply( tmp, names) ) )
res <- as.data.frame(  do.call(rbind, lapply(tmp, function(x)x[unames]) ) )
colnames( res ) <- unames
res
> res
  clientID       time version                       id   user
1      459 -405527905       6                     NULL   NULL
2      459 -405612269    NULL 52a9ea8b534b2b0b5000575f 459001
3      988 1665303163    NULL                     NULL   NULL
4      459 -405626089    NULL 52a9ec00b73cbf0b210057e9 459001
5      459  816094026    NULL                     NULL   NULL