R分裂长度不规则的字符串_R_String_Replace_Substring_Strsplit

R分裂长度不规则的字符串

r string replace

R分裂长度不规则的字符串,r,string,replace,substring,strsplit,R,String,Replace,Substring,Strsplit,我有一个带有+3000个字符串的data.frame列，我希望将其分隔开，但它们是不规则的，尽管有一个模式。下面是一些示例，以及我希望将它们转换成的内容 00700/Z14P120:xhkg 03988/Z14C3.2:xhkg 6A/F15C0.905:xcme ADS/X14P56:xeur AX1/X14P375:xams BIDU/28X14C250:xcbf ES/F15C1960:xcme 毛发/M16P8:xams 00700 | p | 120 03988|C|3.2 6A |

我有一个带有+3000个字符串的data.frame列，我希望将其分隔开，但它们是不规则的，尽管有一个模式。下面是一些示例，以及我希望将它们转换成的内容

00700/Z14P120:xhkg
03988/Z14C3.2:xhkg
6A/F15C0.905:xcme
ADS/X14P56:xeur
AX1/X14P375:xams
BIDU/28X14C250:xcbf
ES/F15C1960:xcme
毛发/M16P8:xams

00700 | p | 120
03988|C|3.2
6A | C | 0.905
广告| P | 56
AX1 | P | 375
BIDU|C|250
ES|C|1960
毛皮| P | 8

我认为这涵盖了每个子字符串的所有可能长度和值类型

第一个新列应覆盖输入列，另外两列应覆盖同一data.frame中的现有列空格

另一个复杂问题是，有些data.frame行的格式已经正确，但是有一列可以标识未正确格式化的行。下面是作为.CSV输出的一张表

最终解决方案： 由于NA、类和行号注册的问题，替换现有列中的值比预期的更困难。因此，我最终创建了临时列并替换了整个列，这是一种相当丑陋且低效的方式。然而，Ananda Mahto提供的代码确实工作出色

ETO <- as.array(data_results$InstrumentSymbolCode)
ETO <- do.call(rbind, 
        strsplit(gsub("(.*)/[A-Z0-9]+?([A-Z])([0-9\\.-]+)?:.*", 
                      "\\1NONSENSESPLIT\\2NONSENSESPLIT\\3", ETO),
                "NONSENSESPLIT", fixed = TRUE))
ETO[data_results$ProductCategoryID!=9] <- ""

temp1 <- array(0,nrow(ETO))
temp2 <- array(0,nrow(ETO))
temp3 <- array(0,nrow(ETO))
for (i in 1:nrow(ETO)){
  if (data_results$ProductCategoryID[i]==9) {
    temp1[i] <- ETO[i,1]
    temp2[i] <- ETO[i,2]
    temp3[i] <- ETO[i,3]
  }  else {
    temp1[i] <- as.character(data_results$InstrumentSymbolCode[i])
    temp2[i] <- as.character(data_results$PutCall[i])
    temp3[i] <- data_results$Strike[i]
  }
}
data_results$InstrumentSymbolCode<-as.character(temp1)
data_results$PutCall <- temp2
data_results$Strike <- temp3

ETO您可以将一些正则表达式与strsplit
一起使用，可能类似这样：
do.call(rbind, 
        strsplit(gsub("(.*)/[A-Z0-9]+?([A-Z])([0-9\\.-]+)?:.*", 
                      "\\1NONSENSESPLIT\\2NONSENSESPLIT\\3", mydf$v1),
                 "NONSENSESPLIT", fixed = TRUE))
#      [,1]    [,2] [,3]   
# [1,] "00700" "P"  "120"  
# [2,] "03988" "C"  "3.2"  
# [3,] "6A"    "C"  "0.905"
# [4,] "ADS"   "P"  "56"   
# [5,] "AX1"   "P"  "375"  
# [6,] "BIDU"  "C"  "250"  
# [7,] "ES"    "C"  "1960" 
# [8,] "FUR"   "P"  "8"    

但是，在原始数据中替换这些值的位置/方式还不清楚

样本数据：
mydf <- data.frame(v1 = c("00700/Z14P120:xhkg", "03988/Z14C3.2:xhkg",
  "6A/F15C0.905:xcme", "ADS/X14P56:xeur", "AX1/X14P375:xams",
  "BIDU/28X14C250:xcbf", "ES/F15C1960:xcme", "FUR/M16P8:xams"))

mydf谢谢！我希望第二列进入PutCall列，第三列进入Strike列。既然我有了分离的值，我应该可以自己做。