如何根据data.frame中的所有列进行分组？_R

如何根据data.frame中的所有列进行分组？

如何根据data.frame中的所有列进行分组？,r,R,我在R中有以下data.frame： Introvert Extrovert Nature Presence 0 -1 3 Yes 1 3 2 No 2 5 4 Yes 1

我在R中有以下

data.frame

：

  Introvert      Extrovert      Nature       Presence
     0              -1            3             Yes     
     1               3            2             No
     2               5            4             Yes
     1              -2            0             No

现在，我想用以下方式对响应进行编码：

    3,4 <- Positives
    0,1,2 <- Neutral
    < 0 <- Negatives

这个怎么样

library(tidyverse);
df %>%
    gather(key, value, -Presence) %>%
    mutate(bin = cut(
        value,
        breaks = c(-Inf, -1, 2.5, Inf),
        labels = c("Negatives", "Neutral", "Positives"))) %>%
    select(-value) %>%
    unite(col, key, bin, sep = "_") %>%
    count(Presence, col) %>%
    spread(col, n)
## A tibble: 2 x 6
#  Presence Extrovert_Negativ… Extrovert_Positi… Introvert_Neutr… Nature_Neutral
#  <fct>                 <int>             <int>            <int>          <int>
#1 No                        1                 1                2              2
#2 Yes                       1                 1                2             NA
## ... with 1 more variable: Nature_Positives <int>

这个怎么样

library(tidyverse);
df %>%
    gather(key, value, -Presence) %>%
    mutate(bin = cut(
        value,
        breaks = c(-Inf, -1, 2.5, Inf),
        labels = c("Negatives", "Neutral", "Positives"))) %>%
    select(-value) %>%
    unite(col, key, bin, sep = "_") %>%
    count(Presence, col) %>%
    spread(col, n)
## A tibble: 2 x 6
#  Presence Extrovert_Negativ… Extrovert_Positi… Introvert_Neutr… Nature_Neutral
#  <fct>                 <int>             <int>            <int>          <int>
#1 No                        1                 1                2              2
#2 Yes                       1                 1                2             NA
## ... with 1 more variable: Nature_Positives <int>

为了好玩/练习，我使用@MauritsEvers的答案工作流创建了data.table方法。它比dplyr方法快约60%（参见基准测试）

数据表您可以跳过key和bin列的

unite

，因为在使用dcast时，这可以在与cast相同的步骤中处理

df %>% 
  setDT() %>%
  melt( id = 4 ) %>%
  .[, bin := cut( value, 
                  breaks = c(-Inf, -1, 2.5, Inf),
                  labels = c("Negatives", "Neutral", "Positives") )] %>%
  .[, value := NULL] %>%
  .[, .N, by = c("Presence", "variable", "bin")] %>% 
  dcast( Presence ~ variable + bin, value.var = "N")



Presence Introvert_Neutral Extrovert_Negatives Extrovert_Positives Nature_Neutral Nature_Positives
1:       No                 2                   1                   1              2               NA
2:      Yes                 2                   1                   1             NA                2

基准

为了好玩/练习，我使用@MauritsEvers的答案工作流创建了data.table方法。它比dplyr方法快约60%（参见基准测试）

数据表您可以跳过key和bin列的

unite

，因为在使用dcast时，这可以在与cast相同的步骤中处理

df %>% 
  setDT() %>%
  melt( id = 4 ) %>%
  .[, bin := cut( value, 
                  breaks = c(-Inf, -1, 2.5, Inf),
                  labels = c("Negatives", "Neutral", "Positives") )] %>%
  .[, value := NULL] %>%
  .[, .N, by = c("Presence", "variable", "bin")] %>% 
  dcast( Presence ~ variable + bin, value.var = "N")



Presence Introvert_Neutral Extrovert_Negatives Extrovert_Positives Nature_Neutral Nature_Positives
1:       No                 2                   1                   1              2               NA
2:      Yes                 2                   1                   1             NA                2

基准

我不确定我是否理解您的预期输出。我知道您想对响应进行重新编码，但为什么预期输出只包含

Introvert.*

列。

外向型

和

天性

的反应发生了什么？这将与下一栏中的

内向型

相同<代码>外向型积极、外向型消极、外向型中性等等。事实上，你的预期产出并不是你的全部预期产出，这将是包括你的帖子的关键信息。。。无论如何，我已经在下面发布了一个可能的解决方案。我不确定我是否理解您的预期输出。我知道您想对响应进行重新编码，但为什么预期输出只包含

Introvert.*

列。

外向型

和

天性

的反应发生了什么？这将与下一栏中的

内向型

相同<代码>外向型积极、外向型消极、外向型中性等等。事实上，你的预期产出并不是你的全部预期产出，这将是包括你的帖子的关键信息。。。无论如何，我在下面发布了一个可能的解决方案。只是一个小问题，你是如何在代码中重新编码响应的？我们使用

cut

将值划分为类别（bin），并在你的帖子中定义断点。我们可以使用

labels

参数为不同的类别指定特定的标签。查看

？cut

了解更多信息。仅一个小问题，您是如何在代码中重新编码响应的？我们使用

cut

将值划分为类别（bin），并在您的帖子中定义断点。我们可以使用

labels

参数为不同的类别指定特定的标签。有关更多信息，请查看

？剪切。
library(microbenchmark)
microbenchmark(
  dplyr = {
    df %>%
      gather(key, value, -Presence) %>%
      mutate(bin = cut(
        value,
        breaks = c(-Inf, -1, 2.5, Inf),
        labels = c("Negatives", "Neutral", "Positives"))) %>%
      select(-value) %>%
      unite(col, key, bin, sep = "_") %>%
      count(Presence, col) %>%
      spread(col, n)
  },
  data.table = {
    df %>% 
      setDT() %>%
      melt( id = 4 ) %>%
      .[, bin := cut( value, 
                      breaks = c(-Inf, -1, 2.5, Inf),
                      labels = c("Negatives", "Neutral", "Positives") )] %>%
      .[, value := NULL] %>%
      .[, .N, by = c("Presence", "variable", "bin")] %>% 
      dcast( Presence ~ variable + bin, value.var = "N")
  },
  times = 1000
)

Unit: milliseconds
       expr      min        lq     mean    median        uq      max neval
      dplyr 9.636224 10.083903 10.59597 10.267371 10.458524 26.38649  1000
 data.table 3.458208  3.647401  3.92219  3.835239  3.949568 15.05596  1000