R 整形数据帧并考虑1，如果存在0和1_R_Dataframe

R 整形数据帧并考虑1，如果存在0和1

r dataframe

R 整形数据帧并考虑1，如果存在0和1,r,dataframe,R,Dataframe,我有一个包含500行和20000列的数据框。行包含不同的样本ID，行中有重复的样本ID，但列值不同。我的数据框是这样的 sample_name E002.c1 E004.c1 E005.c1 E007.c1 so on... T4456-C 0 0 0 0 T4456-C 1 0 0 1 T4456-C 1 0

我有一个包含500行和20000列的数据框。行包含不同的样本ID，行中有重复的样本ID，但列值不同。我的数据框是这样的

sample_name   E002.c1   E004.c1  E005.c1  E007.c1  so on...
T4456-C        0           0        0        0
T4456-C        1           0        0        1
T4456-C        1           0        1        1
T9589-C        0           1        0        0
T9589-C        1           1        0        0

有没有办法像这样合并这些ID

如果所有列包含0，则将值视为0。如果列包含至少一个1，则将值视为1。预期产量：-

sample_name   E002.c1   E004.c1  E005.c1  E007.c1  so on...
T4456-C        1           0        1        1
T9589-C        1           1        0        0

试试这个：

library(tidyverse)

df %>%
  group_by(sample_name) %>%
  summarise_all(sum) %>%
  mutate_if(is.numeric, funs(if_else(. > 0, 1, 0)))

试试这个：

library(tidyverse)

df %>%
  group_by(sample_name) %>%
  summarise_all(sum) %>%
  mutate_if(is.numeric, funs(if_else(. > 0, 1, 0)))

还有一种可能性：

df %>%
 group_by(sample_name) %>%
 summarise_all(funs(ifelse(any(. == 1), 1, 0)))

  sample_name E002.c1 E004.c1 E005.c1 E007.c1
  <fct>         <dbl>   <dbl>   <dbl>   <dbl>
1 T4456-C          1.      0.      1.      1.
2 T9589-C          1.      1.      0.      0.

或者只使用基本R：

aggregate(. ~ sample_name, data = df, function(x) ifelse(any(x == 1), 1, 0))

  sample_name E002.c1 E004.c1 E005.c1 E007.c1
1     T4456-C       1       0       1       1
2     T9589-C       1       1       0       0

aggregate(. ~ sample_name, data = df, max)

aggregate(. ~ sample_name, data = df, function(x) any(x %/% 1 == 1)*1)

或按样本名称分组，然后总结@R Yoda提出的最大值：

df %>%
 group_by(sample_name) %>%
 summarise_all(funs(max))

与data.table相同：

以R为基数：

aggregate(. ~ sample_name, data = df, function(x) ifelse(any(x == 1), 1, 0))

  sample_name E002.c1 E004.c1 E005.c1 E007.c1
1     T4456-C       1       0       1       1
2     T9589-C       1       1       0       0

aggregate(. ~ sample_name, data = df, max)

aggregate(. ~ sample_name, data = df, function(x) any(x %/% 1 == 1)*1)

或使用数字除法：

df %>%
 group_by(sample_name) %>%
 summarise_all(funs(any(. %/% 1 == 1)*1))

与data.table相同：

和基本R：

aggregate(. ~ sample_name, data = df, function(x) ifelse(any(x == 1), 1, 0))

  sample_name E002.c1 E004.c1 E005.c1 E007.c1
1     T4456-C       1       0       1       1
2     T9589-C       1       1       0       0

aggregate(. ~ sample_name, data = df, max)

aggregate(. ~ sample_name, data = df, function(x) any(x %/% 1 == 1)*1)

还有一种可能性：

df %>%
 group_by(sample_name) %>%
 summarise_all(funs(ifelse(any(. == 1), 1, 0)))

  sample_name E002.c1 E004.c1 E005.c1 E007.c1
  <fct>         <dbl>   <dbl>   <dbl>   <dbl>
1 T4456-C          1.      0.      1.      1.
2 T9589-C          1.      1.      0.      0.

或者只使用基本R：

aggregate(. ~ sample_name, data = df, function(x) ifelse(any(x == 1), 1, 0))

  sample_name E002.c1 E004.c1 E005.c1 E007.c1
1     T4456-C       1       0       1       1
2     T9589-C       1       1       0       0

aggregate(. ~ sample_name, data = df, max)

aggregate(. ~ sample_name, data = df, function(x) any(x %/% 1 == 1)*1)

或按样本名称分组，然后总结@R Yoda提出的最大值：

df %>%
 group_by(sample_name) %>%
 summarise_all(funs(max))

与data.table相同：

以R为基数：

aggregate(. ~ sample_name, data = df, function(x) ifelse(any(x == 1), 1, 0))

  sample_name E002.c1 E004.c1 E005.c1 E007.c1
1     T4456-C       1       0       1       1
2     T9589-C       1       1       0       0

aggregate(. ~ sample_name, data = df, max)

aggregate(. ~ sample_name, data = df, function(x) any(x %/% 1 == 1)*1)

或使用数字除法：

df %>%
 group_by(sample_name) %>%
 summarise_all(funs(any(. %/% 1 == 1)*1))

与data.table相同：

和基本R：

aggregate(. ~ sample_name, data = df, function(x) ifelse(any(x == 1), 1, 0))

  sample_name E002.c1 E004.c1 E005.c1 E007.c1
1     T4456-C       1       0       1       1
2     T9589-C       1       1       0       0

aggregate(. ~ sample_name, data = df, max)

aggregate(. ~ sample_name, data = df, function(x) any(x %/% 1 == 1)*1)

使用聚合并使用一元运算符的基R选项+

这避免了任何显式的ifelse条件

样本数据如果我没有犯错误，我很惊讶

基本的R解决方案明显比tidyverse/data.table解决方案慢。毕竟，tidyverse代码通常不是高效代码，而是干净代码，而且 data.table解决方案的速度并不明显快于tidyverse/base R解决方案。使用聚合并使用一元运算符的基R选项+

这避免了任何显式的ifelse条件

样本数据如果我没有犯错误，我很惊讶

基本的R解决方案明显比tidyverse/data.table解决方案慢。毕竟，tidyverse代码通常不是高效代码，而是干净代码，而且 data.table解决方案的速度并不明显快于tidyverse/base R解决方案。

我无法加载tidyverse.install.packages'tidyverse'？这是另一个问题。查找第一个出现的错误并将其粘贴到google search中，使用您的操作系统名称linux，windows。@PawełChabros当我运行此命令时，它返回2和1。@NelsonGon您是对的。我已将is_double更改为is.numeric。现在应该可以了。我无法加载tidyverse.install.packages'tidyverse'？这是另一个问题。查找第一个出现的错误并将其粘贴到google search中，使用您的操作系统名称linux，windows。@PawełChabros当我运行此命令时，它返回2和1。@NelsonGon您是对的。我已将is_double更改为is.numeric。现在它应该可以工作了。实际上是有意义的@NelsonGon你的意思是检查每行的值是否有意义吗？用max代替any+ifelse怎么样？产生更好的性能Iguess@R尤达：这是个好主意，我把它添加到我的帖子里了。非常感谢。说得通@NelsonGon你的意思是检查每行的值是否有意义吗？用max代替any+ifelse怎么样？产生更好的性能Iguess@R尤达：这是个好主意，我把它添加到我的帖子里了。非常感谢。