Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/75.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 基于现有列的值在数据帧中创建和填充新列_R - Fatal编程技术网

R 基于现有列的值在数据帧中创建和填充新列

R 基于现有列的值在数据帧中创建和填充新列,r,R,我有一个以下格式的csv: Col1_Status Col1_Value Col2_Status Col2_Value Col3_Status Col3__Value LOW 5 HIGH 5 LOW 5 LOW 8 HIGH 8 LOW 8 HIGH 82 HI

我有一个以下格式的csv:

Col1_Status Col1_Value  Col2_Status Col2_Value Col3_Status  Col3__Value
LOW             5           HIGH         5         LOW           5
LOW             8           HIGH         8         LOW           8
HIGH            82          HIGH         8         LOW           7
HIGH            83          NORMAL       8         LOW           7
HIGH            82          NORMAL       8         LOW           7
我想创建一个新的dataframe,将high和low作为列,例如:

Col1_High  Col1_Low Col2_High Col2_Low Col3_High Col3_Low
    82         5        5        NA        NA        5
    83         8        8        NA        NA        8
    82         NA       8        NA        NA        7
    NA         NA       NA       NA        NA        7
    NA         NA       NA       NA        NA        7
最好的办法是什么

到目前为止,我认为:

#extract the Status Columns from original file into DataFrame
  statusDF <- ret[grepl("Status", colnames(ret))]

  #extract the Value Columns from original file into DataFrame
  originalValueDF <- ret[grepl("Value", colnames(ret))]

  #create new columns attribute_high and attribute_low
  for(i in names(originalValueDF)){
    newValueDF <- originalValueDF[[paste(i, 'High', sep = "_")]]
    newValueDF <- originalValueDF[[paste(i, 'Low', sep = "_")]]
  }

 #populate both columns based on value in attribute status column
 for(i in names(originalValueDF)){
    if (originalValueDF$i == "High"){
      temp <-  # stuck here
    }
  }
#将原始文件中的状态列提取到数据帧中

statusDF这里是一个尝试,使用了大量的
lappy
。我们首先创建一个列表(
l1
),它获取每个“高”和“低”状态的值。但是,这些向量的长度是不同的,因此我们需要将它们都设置为最大值(在我们的例子中是
ind
)。我们将向量转换为具有两列(高和低)的矩阵,并使用
do.call
cbind
获得最终数据帧

l1 <- lapply(seq(1, ncol(df), by = 2), function(i) list(HIGH = df[i+1][df[i] == 'HIGH'],
                                                         LOW = df[i+1][df[i] == 'LOW']))
names(l1) <- paste0('Col', seq(length(l1)))

ind <- max(unlist(lapply(l1, function(i) lengths(i))))

do.call(cbind, lapply(lapply(l1, function(i) lapply(i, `length<-`, ind)), function(j)
                    setNames(data.frame(matrix(unlist(j), ncol = 2)), c('High', 'Low'))))

#  Col1.High Col1.Low Col2.High Col2.Low Col3.High Col3.Low
#1        82        5         5       NA        NA        5
#2        83        8         8       NA        NA        8
#3        82       NA         8       NA        NA        7
#4        NA       NA        NA       NA        NA        7
#5        NA       NA        NA       NA        NA        7

l1
ret
Col3\u Low=c(5,8)
。。。7号在哪里?你的标准是什么?对不起,我刚刚给出了前两个元组作为期望的输出。标准是查看status列并将其提取到新的high或low列中。如果您更新了输出dataframegreat谢谢,您介意解释一下吗?这似乎很复杂
ret <- read.table(text="
Col1_Status Col1_Value  Col2_Status Col2_Value Col3_Status  Col3__Value
LOW             5           HIGH         5         LOW           5
LOW             8           HIGH         8         LOW           8
HIGH            82          HIGH         8         LOW           7
HIGH            83          NORMAL       8         LOW           7
HIGH            82          NORMAL       8         LOW           7
", header = TRUE, stringsAsFactors = F)

# fix column headers
names(ret) <- gsub("(_+)", "_", names(ret))

library(stats)

# extract the column prefixes
prefixes <- unique(gsub("_.+", "", names(ret)))
value_names  <- names(ret[grepl("_Value",  names(ret))])
status_names <- names(ret[grepl("_Status", names(ret))])

library(stats)
# get the lwo values - extract the lows, pad with NA's and set the name to _High
high_values  <- sapply(1:length(prefixes),
                       function(i) {
                         result <- ret[which(ret[, status_names][i] == "HIGH"), value_names][[i]]
                         result[(length(result)+1):nrow(ret)+1] <- NA
                         setNames(list(foo = result[1:nrow(ret)]), paste0(prefixes[i], "_High"))})

# get the lwo values - extract the lows, pad with NA's and set the name to _Low
low_values  <- sapply(1:length(prefixes),
                      function(i) {
                        result <- ret[which(ret[, status_names][i] == "LOW"), value_names][[i]]
                        result[(length(result)+1):nrow(ret)+1] <- NA
                        setNames(list(foo = result[1:nrow(ret)]), paste0(prefixes[i], "_Low"))})

# combine
output <- cbind(data.frame(low_values), data.frame(high_values))

output

#   Col1_Low Col2_Low Col3_Low Col1_High Col2_High Col3_High
# 1        5       NA        5        82         5        NA
# 2        8       NA        8        83         8        NA
# 3       NA       NA        7        82         8        NA
# 4       NA       NA        7        NA        NA        NA
# 5       NA       NA        7        NA        NA        NA