R-在一个数据帧中组合两列,创建新的列标题,使用lappy和merge在大型数据集上重复
我是一个新手R用户,我有一个大数据框(1700列),它由数据和标志列组成:R-在一个数据帧中组合两列,创建新的列标题,使用lappy和merge在大型数据集上重复,r,regex,merge,lapply,paste,R,Regex,Merge,Lapply,Paste,我是一个新手R用户,我有一个大数据框(1700列),它由数据和标志列组成: df <- data.frame( "100249 MERCURY TOTAL ug/L" = runif(10), "100397 TRIHALOMETHANES ug/L" = runif(10), "100397 TRIHALOMETHANES ug/L FLAG" = c("L", "L", NA, "L", "L", NA, "L", NA, NA,
df <- data.frame( "100249 MERCURY TOTAL ug/L" = runif(10),
"100397 TRIHALOMETHANES ug/L" = runif(10),
"100397 TRIHALOMETHANES ug/L FLAG" = c("L", "L", NA, "L", "L", NA, "L", NA, NA, NA),
"100407 XYLENE ug/L" = runif(10),
"100407 XYLENE ug/L FLAG" = c("L", NA, "L", "L", "L", NA, "L", NA, "L", "L"),
check.names=FALSE )
非常感谢您的建议。Reduce(f=函数(dat,col){
Reduce(f = function(dat, col) {
x <- sub(" ?FLAG$", "", col)
if (!x %in% names(dat)) return(dat)
dat[paste0(x, "_COMB")] <- paste(dat[[col]], dat[[x]])
dat[c(col, x)] <- NULL
dat
}, x = grep("FLAG$", names(df), value = TRUE), init = df)
# 100249 MERCURY TOTAL ug/L 100397 TRIHALOMETHANES ug/L_COMB 100407 XYLENE ug/L_COMB
# 1 0.04353999 L 0.375519647961482 L 0.95818781433627
# 2 0.49308933 L 0.931443430483341 NA 0.744603316066787
# 3 0.68270299 NA 0.409499574452639 L 0.993966163368896
# 4 0.26546071 L 0.0351015995256603 L 0.696171462768689
# 5 0.95956891 L 0.603019695729017 L 0.709421107778326
# 6 0.01842927 NA 0.96781362616457 NA 0.201458259951323
# 7 0.12114176 L 0.734256325522438 L 0.457969205919653
# 8 0.93771709 NA 0.309347201371565 NA 0.508297981694341
# 9 0.47122685 NA 0.822285959031433 L 0.87013426842168
# 10 0.11501974 NA 0.56137450854294 L 0.153437153436244
x请显示(不告诉)所需的输出。谢谢,这对我的数据子集有效!但是当我在整个数据集上运行时,我得到了以下错误:`error in[@MKruk我已经编辑了我的答案(在“FLAG”之前删除了一个空格),你能再试一次吗?通过这次最新的编辑,它没有合并列,只是删除了“FLAG”标签并用“\u COMB”替换它在“标志”列中。一旦我将数据子集以从数据集中排除ID变量,我就让您的初始脚本开始工作。如果您将其编辑回原始脚本,我将向上投票。谢谢!
Reduce(f = function(dat, col) {
x <- sub(" ?FLAG$", "", col)
if (!x %in% names(dat)) return(dat)
dat[paste0(x, "_COMB")] <- paste(dat[[col]], dat[[x]])
dat[c(col, x)] <- NULL
dat
}, x = grep("FLAG$", names(df), value = TRUE), init = df)
# 100249 MERCURY TOTAL ug/L 100397 TRIHALOMETHANES ug/L_COMB 100407 XYLENE ug/L_COMB
# 1 0.04353999 L 0.375519647961482 L 0.95818781433627
# 2 0.49308933 L 0.931443430483341 NA 0.744603316066787
# 3 0.68270299 NA 0.409499574452639 L 0.993966163368896
# 4 0.26546071 L 0.0351015995256603 L 0.696171462768689
# 5 0.95956891 L 0.603019695729017 L 0.709421107778326
# 6 0.01842927 NA 0.96781362616457 NA 0.201458259951323
# 7 0.12114176 L 0.734256325522438 L 0.457969205919653
# 8 0.93771709 NA 0.309347201371565 NA 0.508297981694341
# 9 0.47122685 NA 0.822285959031433 L 0.87013426842168
# 10 0.11501974 NA 0.56137450854294 L 0.153437153436244
dat <- df
for (col in grep("FLAG$", names(df), value = TRUE)) {
x <- sub(" ?FLAG$", "", col)
if (!x %in% names(dat)) next
dat[paste0(x, "_COMB")] <- paste(dat[[col]], dat[[x]])
dat[c(col, x)] <- NULL
}
dat