Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/neo4j/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 在colnames中按模式融化数据帧_R_Dataframe_Dplyr_Reshape2_Tidyr - Fatal编程技术网

R 在colnames中按模式融化数据帧

R 在colnames中按模式融化数据帧,r,dataframe,dplyr,reshape2,tidyr,R,Dataframe,Dplyr,Reshape2,Tidyr,我有几个数据帧,每个都有250多个变量。来自第一个数据帧的部分dput: df <- structure(list(id = structure(1:6, .Label = c("00", "01", "02", "03", "04", "05", "06", "08", "09", "10", "11", "12", "13", "14", "15", "All Recordings"), class = "factor"), Geslacht = structure(c(2L, 2L,

我有几个数据帧,每个都有250多个变量。来自第一个数据帧的部分
dput

df <- structure(list(id = structure(1:6, .Label = c("00", "01", "02", "03", "04", "05", "06", "08", "09", "10", "11", "12", "13", "14", "15", "All Recordings"), class = "factor"), Geslacht = structure(c(2L, 2L, 3L, 2L, 2L, 3L), .Label = c("-", "Man", "Vrouw"), class = "factor"), Leeftijd = structure(c(2L, 3L, 3L, 4L, 4L, 4L), .Label = c("-", "13", "14", "15", "17"), class = "factor"), FD.1.01 = c(13.96, 3.46, 2.45, 4.65, 1.18, 0.76), FD.1.02 = c(2.79, 4.32, 5.28, 0.78, 4.03, 0.74), FD.1.03 = c(2.09, 2.96, 5.78, 0.52, 1.12, 0), FD.1.04 = c(0, 2.79, 1.65, 0, 2.11, 2.11), FD.1.05 = c(1.26, 0.96, 8.67, 0.34, 1.77, 2.25), FD.1.06 = c(7.27, 0.58, 12.04, 0, 0.84, 3.39), FD.1.07 = c(3.97, 0.16, 8.37, 0.92, 0, 4.05), FD.1.08 = c(4.45, 0, 4.23, 0, 0, 1.63), FD.1.09 = c(0, 0, 2.07, 0, 0, 0.46), FD.1.10 = c(0, 0, 1.87, 0, 0, 0.42), FD.1.11 = c(0, 0, 9.05, 0, 0, 0), FD.1.12 = c(0, 0, 0, 0, 0, 0), FD.1.13 = c(0, 0, 0, 0, 0, 0), FD.1.14 = c(0, 0, 0, 0, 0, 0), FD.1.15 = c(0, 0, 0, 0, 0, 0), FD.1.16 = c(0, 0, 0, 0, 0, 0), FD.1.17 = c(0, 0, 0, 0, 0, 0), FD.1.18 = c(0, 0, 0, 0, 0, 0), FD.1.19 = c(0, 0, 0, 0, 0, 0), FD.1.20 = c(0, 0, 0, 0, 0, 0), FD.1.21 = c(0, 0, 0, 0, 0, 0), FD.1.22 = c(0, 0, 0, 0, 0, 0)), .Names = c("id", "Geslacht", "Leeftijd", "FD.1.01", "FD.1.02", "FD.1.03", "FD.1.04", "FD.1.05", "FD.1.06", "FD.1.07", "FD.1.08", "FD.1.09", "FD.1.10", "FD.1.11", "FD.1.12", "FD.1.13", "FD.1.14", "FD.1.15", "FD.1.16", "FD.1.17", "FD.1.18", "FD.1.19", "FD.1.20", "FD.1.21", "FD.1.22"), row.names = c(1L, 2L, 3L, 4L, 5L, 7L), class = "data.frame")
尝试:

这也适用于您提供的示例:

df %>% 
gather(FD, Score, grep("^FD", colnames(df))) %>%
head()
  id Geslacht Leeftijd      FD Score
1 00      Man       13 FD.1.01 13.96
2 01      Man       14 FD.1.01  3.46
3 02    Vrouw       14 FD.1.01  2.45
4 03      Man       15 FD.1.01  4.65
5 04      Man       15 FD.1.01  1.18
6 05    Vrouw       15 FD.1.01  0.76
在更大的数据集上

newCols <- simplify2array(replicate(100,df[,-(1:3)]))
colnames(newCols) <- paste0("FD.1.", 23:2222)
df1 <- cbind(df, newCols)
df2 <- df1 %>% 
gather(FD, Score, grep("^FD", colnames(df1)))
dim(df2)
#[1] 13332     5

newCols
grep
FD在colnames中:

melted.df <- melt(df, id=c("id","Geslacht","Leeftijd"),
                  measure.vars=colnames(df)[grepl("^FD",colnames(df))])

melled.df使用
dplyr
tidyr
解决此问题的另一个非常简单的方法是:

melted.df <- df %>% 
  select(id, Geslacht, Leeftijd, starts_with("FD")) %>% 
  gather(FD, Score, starts_with("FD"))

格雷普在colnames
melt(df,id=c(“id”,“Geslacht”,“Leeftijd”),measure.vars=colnames(df)[grepl(“^FD”,colnames(df)))
@agstudy从Jaap的代表点来看,我没想到答案会这么简单,所以我想我们肯定还遗漏了一些东西…@zx8754你是对的:)@zx8754代表点不要什么都说,我通过回答
ggplot2
问题获得了大部分答案;-)<代码>正则表达式
相关内容是我在使用RunFrame时的主要弱点之一。幸运的是,我收到以下错误消息:
度量值错误。属性[[1]]:下标超出界限
当我将其应用于整个DataFrame时,您的包是最新的吗?这段代码适用于文章中提供的示例数据,也许您可以显示更多示例数据?我希望您不介意我接受了另一个答案。它更适合我的需要(尤其是
dplyr
解决方案)。作为补偿,我还对你的另一个答案投了赞成票;-)不幸的是,我收到了以下错误消息:
在measure.attributes[[1]]中出错:当我在尝试第一个代码时将其应用于整个数据帧时,下标超出了界限。您的
dplyr
解决方案的缺点是我必须指定
FD.1.01:FD.1.22
,这不是我想要的,因为以
FD
开头的列数可能会有所不同(正如我在问题中也解释过的)Hi Jaap,谢谢您的评论。您能否提供一个最小的?dput()示例,其中?melt()失败。我更新了dplyr()代码。不确定?dplyr是否在原始数据集中工作。为了测试您的
dplyr
解决方案,我必须先将我的R版本更新为3.1.0。但是,它不起作用:所有带有
FD
的列都被删除了。Jaap,很抱歉听到这个消息。您的意思是它不适用于示例数据还是原始数据集中?我没有注意到任何问题。sessionInfo()R版本3.1.0(2014-04-10)平台:x86_64-unknown-linux-gnu(64位)其他附加软件包:[1]tidyr_0.1 dplyr_0.2 gsubfn_0.6-5 proto_0.3-10 gtools_3.4.0[6]stringr_0.6.2 Reforme2_1.4通过命名空间加载(未附加):[1]assertthat_0.1 magrittr_1.0.1.1.1.1并行_3.1.0 plyr_1.6]tcltk_3.1.0 tools_3.1.0它在原始数据集上不起作用。您的
grep
解决方案返回正确的结果(一个包含5列和1860行的数据帧),其中您的第二个解决方案返回一个只有6行和129列的数据帧(在原始数据集中有251列)
df %>% 
gather(FD, Score, grep("^FD", colnames(df))) %>%
head()
  id Geslacht Leeftijd      FD Score
1 00      Man       13 FD.1.01 13.96
2 01      Man       14 FD.1.01  3.46
3 02    Vrouw       14 FD.1.01  2.45
4 03      Man       15 FD.1.01  4.65
5 04      Man       15 FD.1.01  1.18
6 05    Vrouw       15 FD.1.01  0.76
newCols <- simplify2array(replicate(100,df[,-(1:3)]))
colnames(newCols) <- paste0("FD.1.", 23:2222)
df1 <- cbind(df, newCols)
df2 <- df1 %>% 
gather(FD, Score, grep("^FD", colnames(df1)))
dim(df2)
#[1] 13332     5
res1 <- df %>% 
select(id, Geslacht, Leeftijd, grep("^FD",names(df))) %>% 
gather(FD, Score, grep("^FD",names(df))) 
res2 <-  melt(df, id=c("id","Geslacht", "Leeftijd"), 
      measure.vars=grep("^FD", colnames(df)))
colnames(res2) <- colnames(res1)
identical(res1,res2)
 #[1] TRUE
melted.df <- melt(df, id=c("id","Geslacht","Leeftijd"),
                  measure.vars=colnames(df)[grepl("^FD",colnames(df))])
melted.df <- df %>% 
  select(id, Geslacht, Leeftijd, starts_with("FD")) %>% 
  gather(FD, Score, starts_with("FD"))
melted.df <- melt(df, id = c("id","Geslacht","Leeftijd"),
                  measure.vars = patterns("^FD"))