R 基于现有列的值在数据帧中创建和填充新列
我有一个以下格式的csv:R 基于现有列的值在数据帧中创建和填充新列,r,R,我有一个以下格式的csv: Col1_Status Col1_Value Col2_Status Col2_Value Col3_Status Col3__Value LOW 5 HIGH 5 LOW 5 LOW 8 HIGH 8 LOW 8 HIGH 82 HI
Col1_Status Col1_Value Col2_Status Col2_Value Col3_Status Col3__Value
LOW 5 HIGH 5 LOW 5
LOW 8 HIGH 8 LOW 8
HIGH 82 HIGH 8 LOW 7
HIGH 83 NORMAL 8 LOW 7
HIGH 82 NORMAL 8 LOW 7
我想创建一个新的dataframe,将high和low作为列,例如:
Col1_High Col1_Low Col2_High Col2_Low Col3_High Col3_Low
82 5 5 NA NA 5
83 8 8 NA NA 8
82 NA 8 NA NA 7
NA NA NA NA NA 7
NA NA NA NA NA 7
最好的办法是什么
到目前为止,我认为:
#extract the Status Columns from original file into DataFrame
statusDF <- ret[grepl("Status", colnames(ret))]
#extract the Value Columns from original file into DataFrame
originalValueDF <- ret[grepl("Value", colnames(ret))]
#create new columns attribute_high and attribute_low
for(i in names(originalValueDF)){
newValueDF <- originalValueDF[[paste(i, 'High', sep = "_")]]
newValueDF <- originalValueDF[[paste(i, 'Low', sep = "_")]]
}
#populate both columns based on value in attribute status column
for(i in names(originalValueDF)){
if (originalValueDF$i == "High"){
temp <- # stuck here
}
}
#将原始文件中的状态列提取到数据帧中
statusDF这里是一个尝试,使用了大量的lappy
。我们首先创建一个列表(l1
),它获取每个“高”和“低”状态的值。但是,这些向量的长度是不同的,因此我们需要将它们都设置为最大值(在我们的例子中是ind
)。我们将向量转换为具有两列(高和低)的矩阵,并使用do.call
和cbind
获得最终数据帧
l1 <- lapply(seq(1, ncol(df), by = 2), function(i) list(HIGH = df[i+1][df[i] == 'HIGH'],
LOW = df[i+1][df[i] == 'LOW']))
names(l1) <- paste0('Col', seq(length(l1)))
ind <- max(unlist(lapply(l1, function(i) lengths(i))))
do.call(cbind, lapply(lapply(l1, function(i) lapply(i, `length<-`, ind)), function(j)
setNames(data.frame(matrix(unlist(j), ncol = 2)), c('High', 'Low'))))
# Col1.High Col1.Low Col2.High Col2.Low Col3.High Col3.Low
#1 82 5 5 NA NA 5
#2 83 8 8 NA NA 8
#3 82 NA 8 NA NA 7
#4 NA NA NA NA NA 7
#5 NA NA NA NA NA 7
l1retCol3\u Low=c(5,8)
。。。7号在哪里?你的标准是什么?对不起,我刚刚给出了前两个元组作为期望的输出。标准是查看status列并将其提取到新的high或low列中。如果您更新了输出dataframegreat谢谢,您介意解释一下吗?这似乎很复杂
ret <- read.table(text="
Col1_Status Col1_Value Col2_Status Col2_Value Col3_Status Col3__Value
LOW 5 HIGH 5 LOW 5
LOW 8 HIGH 8 LOW 8
HIGH 82 HIGH 8 LOW 7
HIGH 83 NORMAL 8 LOW 7
HIGH 82 NORMAL 8 LOW 7
", header = TRUE, stringsAsFactors = F)
# fix column headers
names(ret) <- gsub("(_+)", "_", names(ret))
library(stats)
# extract the column prefixes
prefixes <- unique(gsub("_.+", "", names(ret)))
value_names <- names(ret[grepl("_Value", names(ret))])
status_names <- names(ret[grepl("_Status", names(ret))])
library(stats)
# get the lwo values - extract the lows, pad with NA's and set the name to _High
high_values <- sapply(1:length(prefixes),
function(i) {
result <- ret[which(ret[, status_names][i] == "HIGH"), value_names][[i]]
result[(length(result)+1):nrow(ret)+1] <- NA
setNames(list(foo = result[1:nrow(ret)]), paste0(prefixes[i], "_High"))})
# get the lwo values - extract the lows, pad with NA's and set the name to _Low
low_values <- sapply(1:length(prefixes),
function(i) {
result <- ret[which(ret[, status_names][i] == "LOW"), value_names][[i]]
result[(length(result)+1):nrow(ret)+1] <- NA
setNames(list(foo = result[1:nrow(ret)]), paste0(prefixes[i], "_Low"))})
# combine
output <- cbind(data.frame(low_values), data.frame(high_values))
output
# Col1_Low Col2_Low Col3_Low Col1_High Col2_High Col3_High
# 1 5 NA 5 82 5 NA
# 2 8 NA 8 83 8 NA
# 3 NA NA 7 82 8 NA
# 4 NA NA 7 NA NA NA
# 5 NA NA 7 NA NA NA