Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/70.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在R中合并行_R - Fatal编程技术网

在R中合并行

在R中合并行,r,R,我在使用长数据帧的R中工作,但有点问题。我的数据帧实际上由两个较小的数据帧组成。然后,我将时间线从几个月改为几年,以便两者共享一个共同的时间线 然而,我现在面临的问题是,有时我有两行具有相同的时间值(因此每个问卷有一行),但我希望每个时间变量只有一行。(我附上了一张问题的图片,这可能比我的解释更深刻)请注意,在这一点上,我仍然希望数据帧是长格式的,但只希望去掉“多余的行” 谁能告诉我怎么做 还附上了标题代码,其中nomem=ID,time.compressed=时间,sel01-03=第一份问卷

我在使用长数据帧的R中工作,但有点问题。我的数据帧实际上由两个较小的数据帧组成。然后,我将时间线从几个月改为几年,以便两者共享一个共同的时间线

然而,我现在面临的问题是,有时我有两行具有相同的时间值(因此每个问卷有一行),但我希望每个时间变量只有一行。(我附上了一张问题的图片,这可能比我的解释更深刻)请注意,在这一点上,我仍然希望数据帧是长格式的,但只希望去掉“多余的行”

谁能告诉我怎么做

还附上了标题代码,其中nomem=ID,time.compressed=时间,sel01-03=第一份问卷的一部分,close_num和gener_sat=第二份问卷的一部分

`

`

使用重塑2和dplyr软件包 加载库和数据:

library(reshape2)
library(dplyr)

x <- structure(
  list(
    nomem_encr = c(800009L, 800009L, 800009L, 800012L, 800015L, 800015L),
    timeline.compressed = c(79, 79, 95, 79, 28,  28),
    sel01 = c(NA, 6L, NA, NA, NA, 7L),
    sel02 = c(NA, 6L, NA,  NA, NA, 7L),
    sel03 = c(NA, 3L, NA, NA, NA, 5L),
    sel04 = c(NA,  6L, NA, NA, NA, 6L),
    close_num = c(1, NA, 0.2, 1, 0.8, NA),
    gener_sat = c(7L,  NA, 7L, 8L, 7L, NA)
  ), 
  .Names = c(
    "nomem_encr", "timeline.compressed",
    "sel01", "sel02", "sel03", "sel04", "close_num", "gener_sat"
  ),
  class = "data.frame",
  row.names = c(NA, 6L)
)
x
现在,让我们将数据分解为长格式:

melt(data = x, id.vars = c("nomem_encr", "timeline.compressed")) %>%
head(15)
输出:

   nomem_encr timeline.compressed variable value
1      800009                  79    sel01    NA
2      800009                  79    sel01     6
3      800009                  95    sel01    NA
4      800012                  79    sel01    NA
5      800015                  28    sel01    NA
6      800015                  28    sel01     7
7      800009                  79    sel02    NA
8      800009                  79    sel02     6
9      800009                  95    sel02    NA
10     800012                  79    sel02    NA
11     800015                  28    sel02    NA
12     800015                  28    sel02     7
13     800009                  79    sel03    NA
14     800009                  79    sel03     3
15     800009                  95    sel03    NA
Aggregation function missing: defaulting to length
  nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
1     800009                  79     2     2     2     2         2         2
2     800009                  95     1     1     1     1         1         1
3     800012                  79     1     1     1     1         1         1
4     800015                  28     2     2     2     2         2         2
  nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
1     800009                  79     6     6     3     6       1.0         7
2     800009                  95     0     0     0     0       0.2         7
3     800012                  79     0     0     0     0       1.0         8
4     800015                  28     7     7     5     6       0.8         7
如果我们强制转换融化的数据框,默认行为是计算每个项的条目数:

melt(data = x, id.vars = c("nomem_encr", "timeline.compressed")) %>%
  dcast(
    formula = nomem_encr + timeline.compressed ~ variable
  )
输出:

   nomem_encr timeline.compressed variable value
1      800009                  79    sel01    NA
2      800009                  79    sel01     6
3      800009                  95    sel01    NA
4      800012                  79    sel01    NA
5      800015                  28    sel01    NA
6      800015                  28    sel01     7
7      800009                  79    sel02    NA
8      800009                  79    sel02     6
9      800009                  95    sel02    NA
10     800012                  79    sel02    NA
11     800015                  28    sel02    NA
12     800015                  28    sel02     7
13     800009                  79    sel03    NA
14     800009                  79    sel03     3
15     800009                  95    sel03    NA
Aggregation function missing: defaulting to length
  nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
1     800009                  79     2     2     2     2         2         2
2     800009                  95     1     1     1     1         1         1
3     800012                  79     1     1     1     1         1         1
4     800015                  28     2     2     2     2         2         2
  nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
1     800009                  79     6     6     3     6       1.0         7
2     800009                  95     0     0     0     0       0.2         7
3     800012                  79     0     0     0     0       1.0         8
4     800015                  28     7     7     5     6       0.8         7
对于由
800009 79
标识的项目,我们有两个条目(使用
nomem_encr
timeline.compressed
作为标识变量)

我们可以将默认行为更改为其他行为,如
sum

melt(data = x, id.vars = c("nomem_encr", "timeline.compressed")) %>%
  dcast(
    formula = nomem_encr + timeline.compressed ~ variable,
    fun.aggregate = function(xs) sum(xs, na.rm = TRUE)
  )
输出:

   nomem_encr timeline.compressed variable value
1      800009                  79    sel01    NA
2      800009                  79    sel01     6
3      800009                  95    sel01    NA
4      800012                  79    sel01    NA
5      800015                  28    sel01    NA
6      800015                  28    sel01     7
7      800009                  79    sel02    NA
8      800009                  79    sel02     6
9      800009                  95    sel02    NA
10     800012                  79    sel02    NA
11     800015                  28    sel02    NA
12     800015                  28    sel02     7
13     800009                  79    sel03    NA
14     800009                  79    sel03     3
15     800009                  95    sel03    NA
Aggregation function missing: defaulting to length
  nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
1     800009                  79     2     2     2     2         2         2
2     800009                  95     1     1     1     1         1         1
3     800012                  79     1     1     1     1         1         1
4     800015                  28     2     2     2     2         2         2
  nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
1     800009                  79     6     6     3     6       1.0         7
2     800009                  95     0     0     0     0       0.2         7
3     800012                  79     0     0     0     0       1.0         8
4     800015                  28     7     7     5     6       0.8         7
使用重塑2和dplyr包 加载库和数据:

library(reshape2)
library(dplyr)

x <- structure(
  list(
    nomem_encr = c(800009L, 800009L, 800009L, 800012L, 800015L, 800015L),
    timeline.compressed = c(79, 79, 95, 79, 28,  28),
    sel01 = c(NA, 6L, NA, NA, NA, 7L),
    sel02 = c(NA, 6L, NA,  NA, NA, 7L),
    sel03 = c(NA, 3L, NA, NA, NA, 5L),
    sel04 = c(NA,  6L, NA, NA, NA, 6L),
    close_num = c(1, NA, 0.2, 1, 0.8, NA),
    gener_sat = c(7L,  NA, 7L, 8L, 7L, NA)
  ), 
  .Names = c(
    "nomem_encr", "timeline.compressed",
    "sel01", "sel02", "sel03", "sel04", "close_num", "gener_sat"
  ),
  class = "data.frame",
  row.names = c(NA, 6L)
)
x
现在,让我们将数据分解为长格式:

melt(data = x, id.vars = c("nomem_encr", "timeline.compressed")) %>%
head(15)
输出:

   nomem_encr timeline.compressed variable value
1      800009                  79    sel01    NA
2      800009                  79    sel01     6
3      800009                  95    sel01    NA
4      800012                  79    sel01    NA
5      800015                  28    sel01    NA
6      800015                  28    sel01     7
7      800009                  79    sel02    NA
8      800009                  79    sel02     6
9      800009                  95    sel02    NA
10     800012                  79    sel02    NA
11     800015                  28    sel02    NA
12     800015                  28    sel02     7
13     800009                  79    sel03    NA
14     800009                  79    sel03     3
15     800009                  95    sel03    NA
Aggregation function missing: defaulting to length
  nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
1     800009                  79     2     2     2     2         2         2
2     800009                  95     1     1     1     1         1         1
3     800012                  79     1     1     1     1         1         1
4     800015                  28     2     2     2     2         2         2
  nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
1     800009                  79     6     6     3     6       1.0         7
2     800009                  95     0     0     0     0       0.2         7
3     800012                  79     0     0     0     0       1.0         8
4     800015                  28     7     7     5     6       0.8         7
如果我们强制转换融化的数据框,默认行为是计算每个项的条目数:

melt(data = x, id.vars = c("nomem_encr", "timeline.compressed")) %>%
  dcast(
    formula = nomem_encr + timeline.compressed ~ variable
  )
输出:

   nomem_encr timeline.compressed variable value
1      800009                  79    sel01    NA
2      800009                  79    sel01     6
3      800009                  95    sel01    NA
4      800012                  79    sel01    NA
5      800015                  28    sel01    NA
6      800015                  28    sel01     7
7      800009                  79    sel02    NA
8      800009                  79    sel02     6
9      800009                  95    sel02    NA
10     800012                  79    sel02    NA
11     800015                  28    sel02    NA
12     800015                  28    sel02     7
13     800009                  79    sel03    NA
14     800009                  79    sel03     3
15     800009                  95    sel03    NA
Aggregation function missing: defaulting to length
  nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
1     800009                  79     2     2     2     2         2         2
2     800009                  95     1     1     1     1         1         1
3     800012                  79     1     1     1     1         1         1
4     800015                  28     2     2     2     2         2         2
  nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
1     800009                  79     6     6     3     6       1.0         7
2     800009                  95     0     0     0     0       0.2         7
3     800012                  79     0     0     0     0       1.0         8
4     800015                  28     7     7     5     6       0.8         7
对于由
800009 79
标识的项目,我们有两个条目(使用
nomem_encr
timeline.compressed
作为标识变量)

我们可以将默认行为更改为其他行为,如
sum

melt(data = x, id.vars = c("nomem_encr", "timeline.compressed")) %>%
  dcast(
    formula = nomem_encr + timeline.compressed ~ variable,
    fun.aggregate = function(xs) sum(xs, na.rm = TRUE)
  )
输出:

   nomem_encr timeline.compressed variable value
1      800009                  79    sel01    NA
2      800009                  79    sel01     6
3      800009                  95    sel01    NA
4      800012                  79    sel01    NA
5      800015                  28    sel01    NA
6      800015                  28    sel01     7
7      800009                  79    sel02    NA
8      800009                  79    sel02     6
9      800009                  95    sel02    NA
10     800012                  79    sel02    NA
11     800015                  28    sel02    NA
12     800015                  28    sel02     7
13     800009                  79    sel03    NA
14     800009                  79    sel03     3
15     800009                  95    sel03    NA
Aggregation function missing: defaulting to length
  nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
1     800009                  79     2     2     2     2         2         2
2     800009                  95     1     1     1     1         1         1
3     800012                  79     1     1     1     1         1         1
4     800015                  28     2     2     2     2         2         2
  nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
1     800009                  79     6     6     3     6       1.0         7
2     800009                  95     0     0     0     0       0.2         7
3     800012                  79     0     0     0     0       1.0         8
4     800015                  28     7     7     5     6       0.8         7

您可以使用
dplyr
+
tidyr

library(dplyr)
library(tidyr)

df %>%
  group_by(nomem_encr, timeline.compressed) %>%
  summarize_all(funs(sort(.)[1]))
结果:

# A tibble: 4 x 8
# Groups:   nomem_encr [?]
  nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
       <int>               <dbl> <int> <int> <int> <int>     <dbl>     <int>
1     800009                  79     6     6     3     6       1.0         7
2     800009                  95    NA    NA    NA    NA       0.2         7
3     800012                  79    NA    NA    NA    NA       1.0         8
4     800015                  28     7     7     5     6       0.8         7
# A tibble: 4 x 8
# Groups:   nomem_encr [3]
  nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
       <int>               <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
1     800009                  79     6     6     3     6       1.0         7
2     800009                  95     0     0     0     0       0.2         7
3     800012                  79     0     0     0     0       1.0         8
4     800015                  28     7     7     5     6       0.8         7
df = structure(list(nomem_encr = c(800009L, 800009L, 800009L, 800012L, 
800015L, 800015L), timeline.compressed = c(79, 79, 95, 79, 28, 
28), sel01 = c(NA, 6L, NA, NA, NA, 7L), sel02 = c(NA, 6L, NA, 
NA, NA, 7L), sel03 = c(NA, 3L, NA, NA, NA, 5L), sel04 = c(NA, 
6L, NA, NA, NA, 6L), close_num = c(1, NA, 0.2, 1, 0.8, NA), gener_sat = c(7L, 
NA, 7L, 8L, 7L, NA)), .Names = c("nomem_encr", "timeline.compressed", 
"sel01", "sel02", "sel03", "sel04", "close_num", "gener_sat"), class = "data.frame", row.names = c(NA, 
6L))
结果:

# A tibble: 4 x 8
# Groups:   nomem_encr [?]
  nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
       <int>               <dbl> <int> <int> <int> <int>     <dbl>     <int>
1     800009                  79     6     6     3     6       1.0         7
2     800009                  95    NA    NA    NA    NA       0.2         7
3     800012                  79    NA    NA    NA    NA       1.0         8
4     800015                  28     7     7     5     6       0.8         7
# A tibble: 4 x 8
# Groups:   nomem_encr [3]
  nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
       <int>               <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
1     800009                  79     6     6     3     6       1.0         7
2     800009                  95     0     0     0     0       0.2         7
3     800012                  79     0     0     0     0       1.0         8
4     800015                  28     7     7     5     6       0.8         7
df = structure(list(nomem_encr = c(800009L, 800009L, 800009L, 800012L, 
800015L, 800015L), timeline.compressed = c(79, 79, 95, 79, 28, 
28), sel01 = c(NA, 6L, NA, NA, NA, 7L), sel02 = c(NA, 6L, NA, 
NA, NA, 7L), sel03 = c(NA, 3L, NA, NA, NA, 5L), sel04 = c(NA, 
6L, NA, NA, NA, 6L), close_num = c(1, NA, 0.2, 1, 0.8, NA), gener_sat = c(7L, 
NA, 7L, 8L, 7L, NA)), .Names = c("nomem_encr", "timeline.compressed", 
"sel01", "sel02", "sel03", "sel04", "close_num", "gener_sat"), class = "data.frame", row.names = c(NA, 
6L))

您可以使用
dplyr
+
tidyr

library(dplyr)
library(tidyr)

df %>%
  group_by(nomem_encr, timeline.compressed) %>%
  summarize_all(funs(sort(.)[1]))
结果:

# A tibble: 4 x 8
# Groups:   nomem_encr [?]
  nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
       <int>               <dbl> <int> <int> <int> <int>     <dbl>     <int>
1     800009                  79     6     6     3     6       1.0         7
2     800009                  95    NA    NA    NA    NA       0.2         7
3     800012                  79    NA    NA    NA    NA       1.0         8
4     800015                  28     7     7     5     6       0.8         7
# A tibble: 4 x 8
# Groups:   nomem_encr [3]
  nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
       <int>               <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
1     800009                  79     6     6     3     6       1.0         7
2     800009                  95     0     0     0     0       0.2         7
3     800012                  79     0     0     0     0       1.0         8
4     800015                  28     7     7     5     6       0.8         7
df = structure(list(nomem_encr = c(800009L, 800009L, 800009L, 800012L, 
800015L, 800015L), timeline.compressed = c(79, 79, 95, 79, 28, 
28), sel01 = c(NA, 6L, NA, NA, NA, 7L), sel02 = c(NA, 6L, NA, 
NA, NA, 7L), sel03 = c(NA, 3L, NA, NA, NA, 5L), sel04 = c(NA, 
6L, NA, NA, NA, 6L), close_num = c(1, NA, 0.2, 1, 0.8, NA), gener_sat = c(7L, 
NA, 7L, 8L, 7L, NA)), .Names = c("nomem_encr", "timeline.compressed", 
"sel01", "sel02", "sel03", "sel04", "close_num", "gener_sat"), class = "data.frame", row.names = c(NA, 
6L))
结果:

# A tibble: 4 x 8
# Groups:   nomem_encr [?]
  nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
       <int>               <dbl> <int> <int> <int> <int>     <dbl>     <int>
1     800009                  79     6     6     3     6       1.0         7
2     800009                  95    NA    NA    NA    NA       0.2         7
3     800012                  79    NA    NA    NA    NA       1.0         8
4     800015                  28     7     7     5     6       0.8         7
# A tibble: 4 x 8
# Groups:   nomem_encr [3]
  nomem_encr timeline.compressed sel01 sel02 sel03 sel04 close_num gener_sat
       <int>               <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
1     800009                  79     6     6     3     6       1.0         7
2     800009                  95     0     0     0     0       0.2         7
3     800012                  79     0     0     0     0       1.0         8
4     800015                  28     7     7     5     6       0.8         7
df = structure(list(nomem_encr = c(800009L, 800009L, 800009L, 800012L, 
800015L, 800015L), timeline.compressed = c(79, 79, 95, 79, 28, 
28), sel01 = c(NA, 6L, NA, NA, NA, 7L), sel02 = c(NA, 6L, NA, 
NA, NA, 7L), sel03 = c(NA, 3L, NA, NA, NA, 5L), sel04 = c(NA, 
6L, NA, NA, NA, 6L), close_num = c(1, NA, 0.2, 1, 0.8, NA), gener_sat = c(7L, 
NA, 7L, 8L, 7L, NA)), .Names = c("nomem_encr", "timeline.compressed", 
"sel01", "sel02", "sel03", "sel04", "close_num", "gener_sat"), class = "data.frame", row.names = c(NA, 
6L))

你还可以提供样本数据吗。使用
head
创建一个子集,并
dput
向我们展示如何在回应您的第一条评论时复制它:我恐怕无法完全理解您的评论。我猜对于每一行,要么X变量被回答,要么Y变量被回答。然而,有时两行具有相同的时间变量,即X和Y变量同时被回答。我想要的是将这些行合并成一行,其中X和Y变量都得到了回答。我们如何知道哪些行需要被修剪?@jaySf我想要合并的行是那些具有重叠时间线的行。压缩值!第3行的NA发生了什么变化,其中
nomem_enr==800009
timeline.compressed==95
?您是否希望在最终输出中保留NA?您还可以提供示例数据。使用
head
创建一个子集,并
dput
向我们展示如何在回应您的第一条评论时复制它:我恐怕无法完全理解您的评论。我猜对于每一行,要么X变量被回答,要么Y变量被回答。然而,有时两行具有相同的时间变量,即X和Y变量同时被回答。我想要的是将这些行合并成一行,其中X和Y变量都得到了回答。我们如何知道哪些行需要被修剪?@jaySf我想要合并的行是那些具有重叠时间线的行。压缩值!第3行的NA发生了什么变化,其中
nomem_enr==800009
timeline.compressed==95
?更新:我刚刚注意到,当我使用这段代码时,它返回的数据是0、1和2,而不是实际值。我复制粘贴了您的语法并将其应用于整个数据集。你知道会出什么问题吗?此外,我还得到了这个错误:聚合函数缺失:默认为lengthstructure(列表(nomem_encr=c(800009L,800009L,800012L,800015 L,800015 L),timeline.compressed=c(79,95,79,28,40,52),sel01=c(1L,0L,0L,1L,0L),sel02=c(1L,0L,0L,1L,1L,1L,0L),sel03=c(1L,0L,0L,0L),close_num=c(1L,1L,1L,1L,1L,1L,1L),gener_sat=c(1L,1L,1L,1L,1L,1L,1L),Names=c(“nomem_encr”,“timeline.compressed”,“sel01”,“sel02”,“sel03”,“close_num”,“gener_sat”),class=“data.frame”,row.Names=c(NA,6L))每个条目都有多个条目,
dcast
的默认行为是计算每个条目的条目数(特定条目的条目数为0、1或2)你也可以考虑回到前一种格式的数据,然后从那里开始。我更新了我的答案来显示更多的细节,我希望这能帮助你找到一个解决方案。谢谢。我发现问题是,我还有另一个时间线变量,它也被重叠,但并没有达到TimelNIEN压缩的相同程度。更新:我刚刚注意到,当我使用这段代码时,它以0、1和2的形式返回我的数据,而不是实际值。我复制粘贴了你的语法,并将其应用于整个数据集。你知道会出什么问题吗?另外,我得到了这个错误:聚合函数未命中ing:默认为长度结构(列表(命名为encr=c(800009L,800009L,800012L,800015L,800015L,800015L,800015L),timeline.compressed=c(79,95,79,28,40,52),sel01=c(1L,0L,0L,1L,1L,1L,0L),sel03=c(1L,0L,0L,0L,1L,1L,0L),close\U num=c(1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L),.Names=c(“名称”、“时间线压缩”、“sel01”、“sel02”,