Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/65.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 将单个数据帧转换为数据帧列表(将列名解析为前缀和后缀)_R_List_Dataframe - Fatal编程技术网

R 将单个数据帧转换为数据帧列表(将列名解析为前缀和后缀)

R 将单个数据帧转换为数据帧列表(将列名解析为前缀和后缀),r,list,dataframe,R,List,Dataframe,我希望找到一种将单个数据帧转换为数据帧列表的有效方法。以下是我的可复制MWE: set.seed(1) ABAge = runif(100) ABPoints = rnorm(100) ACAge = runif(100) ACPoints = rnorm(100) BCAge = runif(100) BCPoints = rnorm(100) A_B <- data.frame(ID = as.character(paste0("ID", 1:100)), Age = ABAge,

我希望找到一种将单个数据帧转换为数据帧列表的有效方法。以下是我的可复制MWE:

set.seed(1)
ABAge = runif(100)
ABPoints = rnorm(100)
ACAge = runif(100)
ACPoints = rnorm(100)
BCAge = runif(100)
BCPoints = rnorm(100)

A_B <- data.frame(ID = as.character(paste0("ID", 1:100)), Age = ABAge, Points = ABPoints)
A_C <- data.frame(ID = as.character(paste0("ID", 1:100)), Age = ACAge, Points = ACPoints)
B_C <- data.frame(ID = as.character(paste0("ID", 1:100)), Age = BCAge, Points = BCPoints)
A_B$ID <- as.character(A_B$ID)
A_C$ID <- as.character(A_C$ID)
B_C$ID <- as.character(B_C$ID)

listFormat <- list("A_B" = A_B, "A_C" = A_C, "B_C" = B_C)

dfFormat <- data.frame(ID = as.character(paste0("ID", 1:100)), A_B.Age = ABAge, A_B.Points = ABPoints, A_C.Age = ACAge, A_C.Points = ACPoints, B_C.Age = BCAge, B_C.Points = BCPoints)
dfFormat$ID = as.character(dfFormat$ID)
数据帧列表listFormat如下所示:

'data.frame':   100 obs. of  7 variables:
 $ ID        : chr  "ID1" "ID2" "ID3" "ID4" ...
 $ A_B.Age   : num  0.266 0.372 0.573 0.908 0.202 ...
 $ A_B.Points: num  0.398 -0.612 0.341 -1.129 1.433 ...
 $ A_C.Age   : num  0.6737 0.0949 0.4926 0.4616 0.3752 ...
 $ A_C.Points: num  0.409 1.689 1.587 -0.331 -2.285 ...
 $ B_C.Age   : num  0.814 0.929 0.147 0.75 0.976 ...
 $ B_C.Points: num  1.474 0.677 0.38 -0.193 1.578 ...
List of 3
 $ A_B:'data.frame':    100 obs. of  3 variables:
  ..$ ID    : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
  ..$ Age   : num [1:100] 0.266 0.372 0.573 0.908 0.202 ...
  ..$ Points: num [1:100] 0.398 -0.612 0.341 -1.129 1.433 ...
 $ A_C:'data.frame':    100 obs. of  3 variables:
  ..$ ID    : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
  ..$ Age   : num [1:100] 0.6737 0.0949 0.4926 0.4616 0.3752 ...
  ..$ Points: num [1:100] 0.409 1.689 1.587 -0.331 -2.285 ...
 $ B_C:'data.frame':    100 obs. of  3 variables:
  ..$ ID    : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
  ..$ Age   : num [1:100] 0.814 0.929 0.147 0.75 0.976 ...
  ..$ Points: num [1:100] 1.474 0.677 0.38 -0.193 1.578 ...
我希望能想出一种自动化的方法将dfFormat转换为listFormat。从上述对象中可以看出,存在两个主要条件:

列ID始终是dfFormat中的第一列,并且始终是listFormat的每个子列表中的第一列

子列表的数量等于下划线“\”之前dfFormat中唯一列名的数量。在本例中,这是三个前缀,例如A_B、A_C和B_C。这些前缀也是三个子列表的名称

在每个子列表中,它包含具有相关前缀A_B的列数。对于每个子列表,这是两个年龄和点数。这些后缀是列的名称

我问了一个相反的问题,即如何从listFormat转换为dfFormat,并从中得到了一些有用的答案。我需要有代码来反向两个方向,似乎反向可能需要新类型的代码。我把我的尝试放在下面,以显示我是如何被卡住的

conUnd <- which(sapply(colnames(dfFormat), function(x) grepl("_", x)))
listName <- sapply(colnames(dfFormat[,conUnd]), function(x) strsplit(x, "[.]")[[1]][1])
uListName <- unique(sapply(colnames(dfFormat[,conUnd]), function(x) strsplit(x, "[.]")[[1]][1]))
listCol <- sapply(colnames(dfFormat[,conUnd]), function(x) strsplit(x, "[.]")[[1]][2])

listFormat = list()
for (i in 1:length(uListName)){
   [Gets messy here trying to define column names based on string variables]
}
如有任何建议,将不胜感激。我知道我的代码效率不高。

您可以在基本R-

output <- lapply(split.default(dfFormat[-1], sub("\\..*", "",names(dfFormat[-1]))), 
          function(x) cbind(dfFormat[1], setNames(x, sub(".*\\.", "", names(x)))))
str(output)

#List of 3
# $ A_B:'data.frame':   100 obs. of  3 variables:
#  ..$ ID    : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
#  ..$ Age   : num [1:100] 0.266 0.372 0.573 0.908 0.202 ...
#  ..$ Points: num [1:100] 0.398 -0.612 0.341 -1.129 1.433 ...
# $ A_C:'data.frame':   100 obs. of  3 variables:
#  ..$ ID    : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
#  ..$ Age   : num [1:100] 0.6737 0.0949 0.4926 0.4616 0.3752 ...
#  ..$ Points: num [1:100] 0.409 1.689 1.587 -0.331 -2.285 ...
# $ B_C:'data.frame':   100 obs. of  3 variables:
#  ..$ ID    : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
#  ..$ Age   : num [1:100] 0.814 0.929 0.147 0.75 0.976 ...
#  ..$ Points: num [1:100] 1.474 0.677 0.38 -0.193 1.578 ...
使用tidyverse的解决方案。它包括将数据帧转换为长格式、拆分列、将其展开,以及基于组名拆分数据帧。在最后一行中,不需要as.data.frameStringsAsAffactors=FALSE,因为tibble也是一个数据帧。我添加了它,以便向您显示输出与预期列表相同

library(tidyverse)

listFormat_output <- dfFormat %>%
  pivot_longer(cols = -ID, names_to = "Type") %>%
  separate(Type, into = c("Group", "Parameter"), sep = "\\.") %>%
  pivot_wider(names_from = Parameter) %>%
  group_split(Group) %>%
  setNames(nm = map_chr(., ~unique(.x$Group))) %>%
  map(~.x %>% select(-Group) %>% as.data.frame(stringsAsFactors = FALSE))

# Check if the output is the same as the expected list
identical(listFormat, listFormat_output)
# [1] TRUE
使用mget、ls和正则表达式似乎可以得到您想要的结果

数据:

编辑: 您的dataframe dfFormat具有以下结构:

str(dfFormat)
'data.frame':   100 obs. of  7 variables:
 $ ID        : chr  "ID1" "ID2" "ID3" "ID4" ...
 $ A_B.Age   : num  0.266 0.372 0.573 0.908 0.202 ...
 $ A_B.Points: num  0.398 -0.612 0.341 -1.129 1.433 ...
 $ A_C.Age   : num  0.6737 0.0949 0.4926 0.4616 0.3752 ...
 $ A_C.Points: num  0.409 1.689 1.587 -0.331 -2.285 ...
 $ B_C.Age   : num  0.814 0.929 0.147 0.75 0.976 ...
 $ B_C.Points: num  1.474 0.677 0.38 -0.193 1.578 ...
解决方案:

该解决方案采用dfFormat格式的列名,与要使用正则表达式将数据帧转换为数据帧列表的列相匹配:

listFormat <-  mget(ls(pattern = "^A_B|^A_C|^B_C"))

请投否决票的人给我一个提示,说明他们为什么这么做。OP表示,我希望能想出一种自动方法,将dfFormat转换为listFormat,这正是我的答案。感谢您的评论,@Edward和@www。我的解决方案确实是从一个数据帧开始的,即OP的数据帧dfFormat。如果该数据帧在工作空间中,则提供的解决方案有效。@www dfFormat的列名包含字符串A_B、A_C和B_C;使用正则表达式可以匹配它们。@www抱歉,不同意。如果我做军事革命;军事革命;人民币;;rmdf格式;rmlistFormat和从头开始一切都很好。我认为如果要求选民给出理由,情况会有所改善。
str(listFormat)
List of 3
 $ A_B:'data.frame':    100 obs. of  3 variables:
  ..$ ID    : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
  ..$ Age   : num [1:100] 0.266 0.372 0.573 0.908 0.202 ...
  ..$ Points: num [1:100] 0.398 -0.612 0.341 -1.129 1.433 ...
 $ A_C:'data.frame':    100 obs. of  3 variables:
  ..$ ID    : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
  ..$ Age   : num [1:100] 0.6737 0.0949 0.4926 0.4616 0.3752 ...
  ..$ Points: num [1:100] 0.409 1.689 1.587 -0.331 -2.285 ...
 $ B_C:'data.frame':    100 obs. of  3 variables:
  ..$ ID    : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
  ..$ Age   : num [1:100] 0.814 0.929 0.147 0.75 0.976 ...
  ..$ Points: num [1:100] 1.474 0.677 0.38 -0.193 1.578 ...