将列值汇总并聚合为R中的行_R_Statistics_Reshape2

将列值汇总并聚合为R中的行

r statistics

将列值汇总并聚合为R中的行,r,statistics,reshape2,R,Statistics,Reshape2,我的数据框主要包含分类列和一个数字列，df如下所示（简化）：对于每一个分类列，我想显示它的所有不同值、频率和与之相关的租金总和结果应该如下所示： **Variable** **Distinct_values** **No_of-Occurences** **SUM_RENT** Home_type Vila 2 12000 Home_type C

我的数据框主要包含分类列和一个数字列，df如下所示（简化）：

对于每一个分类列，我想显示它的所有不同值、频率和与之相关的租金总和

结果应该如下所示：

**Variable**     **Distinct_values**      **No_of-Occurences**     **SUM_RENT**
  Home_type        Vila                     2                        12000
  Home_type        Condo                    2                        3700
  Home_type        Appartment               2                        1300
  Garden_type      big                      1                        5000
  Garden_type      small                    1                        7000
  Garden_type      shared                   1                        2000 
  Garden_type      none                     3                        3000 
  Naighbourhood    brooklyn                 2                        5500
  Naighbourhood    Bronx                    2                        8700 
  Naighbourhood    Sillicon Valley          2                        2800

我是R新手，曾尝试使用melt-in-Reforme2进行此操作，但没有太大成功，任何帮助都将不胜感激。

我倾向于最近使用

tidyr

而不是

Reforme2

，尽管这主要是因为语法更类似于

dplyr

——由于加载

magrittr

管道（

%>%

）及其数据摘要工具，这将使此任务更加容易

首先，我们

将（从tidyr
）所有非出租列收集成长格式（仅运行这两行即可查看结果）。然后，按要聚集在一起的列进行分组。最后，总结每个组中的
，以获得所需的指标
df %>%
  gather(Variable, Distinct_Values, -Rent) %>%
  group_by(Variable, Distinct_Values) %>%
  summarise(
    `No_of-Occurences` = n()
    , SUM_RENT = sum(Rent)
  )

给出：
        Variable Distinct_Values `No_of-Occurences` SUM_RENT
           <chr>           <chr>              <int>    <int>
1    Garden_type             big                  1     5000
2    Garden_type            none                  3     3000
3    Garden_type          shared                  1     2000
4    Garden_type           small                  1     7000
5      Home_type      Appartment                  2     1300
6      Home_type           Condo                  2     3700
7      Home_type            Vila                  2    12000
8  NaighbourhoOd           bronx                  2     8700
9  NaighbourhoOd        brooklyn                  2     5500
10 NaighbourhoOd Sillicon valley                  1     2000
11 NaighbourhoOd Sillicon Valley                  1      800

变量不同值`不发生`总租金
1个花园式大1 5000
2花园式无3 3000
3个花园类型共享1 2000
4个花园式小型1 7000
5家庭式公寓2 1300
6套住宅式公寓2 3700
7家用型Vila 2 12000
8奈格布尔胡德布朗克斯区2 8700
9布鲁克林奈格布尔胡德2 5500
10奈格布胡德西里康谷1 2000
11 NaighbourhoOd Sillicon Valley 1800

（请注意，您的数据中有“V”和“V”表示“硅谷”，这两行是分开的。）
我们可以使用数据。表
。将'data.frame'转换为'data.table'（setDT（df1）
），melt
从'wide'格式转换为'long'格式，按'variable'、'value'（从melt
创建的列）分组，我们创建两列'No_of_occurrent'、'SUM_RENT'作为行数（.N
）和'RENT'列的SUM
，然后按“变量”、“无发生”和“总和”分组，得到“值”列（“不同的值”）的unique
元素
您可能需要查看，特别是提供了方便读取数据的方法的部分。如果我们不必费力地把你的数据读入R，那么帮助就容易多了。谢谢你向我指出，马克，我以后一定会更加小心，稍后会编辑这篇文章。效果很好，谢谢。我不知道tidyr，肯定会使用更多。
        Variable Distinct_Values `No_of-Occurences` SUM_RENT
           <chr>           <chr>              <int>    <int>
1    Garden_type             big                  1     5000
2    Garden_type            none                  3     3000
3    Garden_type          shared                  1     2000
4    Garden_type           small                  1     7000
5      Home_type      Appartment                  2     1300
6      Home_type           Condo                  2     3700
7      Home_type            Vila                  2    12000
8  NaighbourhoOd           bronx                  2     8700
9  NaighbourhoOd        brooklyn                  2     5500
10 NaighbourhoOd Sillicon valley                  1     2000
11 NaighbourhoOd Sillicon Valley                  1      800

library(data.table)
melt(setDT(df1), id.var=c('Rent'))[, c("No_of_occur", "SUM_RENT") :=
      .(.N, sum(Rent)) ,.(variable, value)][,
    .(Distinct_values = unique(value)) , .(variable, No_of_occur, SUM_RENT)]
 #         variable No_of_occur SUM_RENT Distinct_values
 #1:     Home_type           2    12000            Vila
 #2:     Home_type           2     3700           Condo
 #3:     Home_type           2     1300      Appartment
 #4:   Garden_type           1     5000             big
 #5:   Garden_type           1     7000           small
 #6:   Garden_type           1     2000          shared
 #7:   Garden_type           3     3000            none
 #8: NaighbourhoOd           2     5500        brooklyn
 #9: NaighbourhoOd           2     8700           bronx
 #10:NaighbourhoOd           2     2800 Sillicon Valley