R 拆分列，然后聚合唯一值的计数_R_Dataframe

R 拆分列，然后聚合唯一值的计数

r dataframe

R 拆分列，然后聚合唯一值的计数,r,dataframe,R,Dataframe,我有以下数据集： color type 1 black chair 2 black chair 3 black sofa 4 green table 5 green sofa arg value 1 color black 2 color black 3 color black 4 color green 5 color green 6 type chair 7 type chair 8 type sofa 9 type table 10 type sofa 我

我有以下数据集：

  color  type
1 black chair
2 black chair
3 black  sofa
4 green table
5 green  sofa

    arg value
1 color black
2 color black
3 color black
4 color green
5 color green
6  type chair
7  type chair
8  type  sofa
9  type table
10 type  sofa

我想将其拆分为以下数据集：

  color  type
1 black chair
2 black chair
3 black  sofa
4 green table
5 green  sofa

    arg value
1 color black
2 color black
3 color black
4 color green
5 color green
6  type chair
7  type chair
8  type  sofa
9  type table
10 type  sofa

然后，我想计算所有arg值组合的唯一值：

    arg value count
1 color black     3
2 color green     2
3  type chair     2
4  type  sofa     2
5  type table     1

它不需要按计数排序。然后以以下输出形式打印：

    arg unique_count_values
1 color black(3) green(2)
2  type chair(2) sofa(2) table(1)

我尝试了以下方法：

AttrList<-colnames(DataSet)
aggregate(.~ AttrList, DataSet, FUN=function(x) length(unique(x)) )

转换到

表

：

as.data.frame(table(x))

这给了我：

     x Freq
1    1    1
2    2    1
3    3    2
4    4    2
5    5    3
6    7    1
7  101    2
8  102    2
9  103    2
10 104    2
11 105    1
12 106    1

我应该怎么做才能得到这个：

    V Val Freq
1  V2   1    1
2  V2   2    1
3  V2   3    2
4  V2   4    2
5  V2   5    3
6  V2   7    1
7  V1 101    2
8  V1 102    2
9  V1 103    2
10 V1 104    2
11 V1 105    1
12 V1 106    1

试一试

其中：

#Source: local data frame [2 x 2]
#
#     arg         unique_count_values
#  (fctr)                       (chr)
#1  color          black(3), green(2)
#2   type chair(2), sofa(2), table(1)

这里有一个基本的R方法。我把它扩展了一点，这样我就可以对正在发生的事情添加评论

基本思想是使用

sapply

循环遍历列，将每列中的数据制成表格，然后使用

sprintf

提取表格的相关部分，以获得所需的输出（名称，后面是括号中的值）

stack

函数获取最终命名向量并将其转换为

data.frame

stack(                        ## convert the final output to a data.frame
  sapply(                     ## cycle through each column
    mydf, function(x) {
      temp <- table(x)        ## calculate counts and paste together values
      paste(sprintf("%s (%d)", names(temp), temp), collapse = " ")
    }))
#                         values   ind
# 1          black (3) green (2) color
# 2 chair (2) sofa (2) table (1)  type

这可以通过使用FNN、plyr或基本包来实现吗？我想知道任何替代方法，因为这对我来说似乎有点复杂。此外，我还想知道如何转换数据集以获得arg和value列。

stack(apply(summary(mydf), 2, function(x) paste(na.omit(x), collapse = " ")))
#                          values     ind
# 1           black:3   green:2     color
# 2 chair:2   sofa :2   table:1      type