R 将可变产品代码拆分为字母和数字
我有一个产品代码变量,如:R 将可变产品代码拆分为字母和数字,r,split,R,Split,我有一个产品代码变量,如: Product Code RMMI001, RMMI001, CMCM009, ASCMOT064, ASPMOA023, CMCM009, CMCM012, CMCM001, ASCMBW001, RMMI001, TMHO002, TMSP001, TMHO002, TMDMST003 我需要拆分这些字符,并将这些字符放在另一列中。您可以尝试在此处使用sub删除所有尾随数字,留下字符部分: df <- data.frame(product_code=c("
Product Code
RMMI001,
RMMI001,
CMCM009,
ASCMOT064,
ASPMOA023,
CMCM009,
CMCM012,
CMCM001,
ASCMBW001,
RMMI001,
TMHO002,
TMSP001,
TMHO002,
TMDMST003
我需要拆分这些字符,并将这些字符放在另一列中。您可以尝试在此处使用
sub
删除所有尾随数字,留下字符部分:
df <- data.frame(product_code=c("RMMI001", "RMMI001", "CMCM009"))
df$code <- sub("\\d*$", "", df$product_code)
df
product_code code
1 RMMI001 RMMI
2 RMMI001 RMMI
3 CMCM009 CMCM
像这样的东西怎么样
# Sample product codes
ss <- c("RMMI001", "RMMI001", "CMCM009", "ASCMOT064", "ASPMOA023", "CMCM009", "CMCM012", "CMCM001", "ASCMBW001", "RMMI001", "TMHO002", "TMSP001", "TMHO002", "TMDMST003")
# Separate code and numbers and store in data.frame
read.csv(text = gsub("^([a-zA-Z]+)(\\d+)$", "\\1,\\2", ss), header = F)
# V1 V2
#1 RMMI 1
#2 RMMI 1
#3 CMCM 9
#4 ASCMOT 64
#5 ASPMOA 23
#6 CMCM 9
#7 CMCM 12
#8 CMCM 1
#9 ASCMBW 1
#10 RMMI 1
#11 TMHO 2
#12 TMSP 1
#13 TMHO 2
#14 TMDMST 3
#产品代码示例
ss您也可以使用tidyr::extract
,它仅适用于数据帧
tidyr::extract(data.frame(x =c("RMMI001", "CMCM009")),x, c("first", "second"), "([a-zA-Z]+)(\\d+)" )
输出:
# first second
#1 RMMI 001
#2 CMCM 009
如果您选择“([a-zA-Z]+)\d+”而不是“([a-zA-Z]+)(\d+”,这将在单独的列中提取字母和数字。然后,它将只提取第一个匹配项,表示为英语单词,如下所示。注意,这里的区别是用括号表示的捕获组。它在这里用于捕获匹配项,在本例中,这些是单词和数字,分为单独的列
tidyr::extract(data.frame(x =c("RMMI001", "CMCM009")),x, c("first"), "([a-zA-Z]+)\\d+" )
# first
# 1 RMMI
# 2 CMCM
对于data=datafull,变量列名为product,如何使用上述代码@PKumar@prabhuprasad,extract(datafull,product,“productnew”,“([a-zA-Z]+)\\d+”,
非常感谢。。。productnew替换了产品,我不想让它被替换…Instaded productnew想在附加列中@PKumar@prabhu在函数中使用参数remove=F。它将保留你的专栏。提取(datafull,product,“productnew”,“([a-zA-Z]+)\\d+”,remove=F),我有一个数据-变量是产品数量,产品代码(1000种不同类型),日期(按天计算)…对于每个产品代码@PKumar,哪种模型适合查找月份预测