R 将唯一值（在多列中）分散到不同的列并粘贴聚合值_R_Dataframe_Data.table

R 将唯一值（在多列中）分散到不同的列并粘贴聚合值

r dataframe

R 将唯一值（在多列中）分散到不同的列并粘贴聚合值,r,dataframe,data.table,R,Dataframe,Data.table,我有一个数据帧，如下所示： structure(list(Value = c(1, 2, 3, 4), col1 = structure(c(1L, 1L, 2L, 2L), .Label = c("A1", "A2"), class = "factor"), col2 = structure(c(1L, 2L, 2L, 1L), .Label = c("B1", "B2"), class = "factor"), col3 = structure(1:4, .Label = c("C1",

我有一个数据帧，如下所示：

structure(list(Value = c(1, 2, 3, 4), col1 = structure(c(1L, 
1L, 2L, 2L), .Label = c("A1", "A2"), class = "factor"), col2 = structure(c(1L, 
2L, 2L, 1L), .Label = c("B1", "B2"), class = "factor"), col3 = structure(1:4, .Label = c("C1", 
"C2", "C3", "C4"), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L))

我想使用data.table将每列中的唯一值分散到不同的列，并将求和值（来自列“value”）粘贴到每列下对于ex:col1列有两个唯一值A1和A2。A1的和是3，A2的和是7 类似地，列col2有两个唯一的值B1和B2。B1和B2之和为5

此操作将针对col1、col2和col3列中的每一列执行

预期输出如下所示

structure(list(A1 = 3, A2 = 7, B1 = 5, B2 = 5, C1 = 1, C2 = 2, 
    C3 = 3, C4 = 4), class = "data.frame", row.names = c(NA, 
-1L))

如何在R中实现这一点？

我对

数据不太适应。表但是tidyverse
解决方案可以
library(dplyr)
library(tidyr)

df %>% 
 pivot_longer(starts_with('col')) %>% 
 group_by(value) %>% 
 summarise(res = sum(Value)) %>% 
 pivot_wider(names_from = value, values_from = res)

这就给了,
#一个tible:1 x 8
A1 A2 B1 B2 C1 C2 C3 C4
1     3     7     5     5     1     2     3     4

我对数据不太适应。表但是tidyverse
解决方案可以
library(dplyr)
library(tidyr)

df %>% 
 pivot_longer(starts_with('col')) %>% 
 group_by(value) %>% 
 summarise(res = sum(Value)) %>% 
 pivot_wider(names_from = value, values_from = res)

这就给了,
#一个tible:1 x 8
A1 A2 B1 B2 C1 C2 C3 C4
1     3     7     5     5     1     2     3     4

数据。表
版本@Sotos的答案是：
library(data.table)

dcast(melt(setDT(df), 'Value')[, .(Total = sum(Value)), value],
           rowid(value)~value, value.var = 'Total')

#   value A1 A2 B1 B2 C1 C2 C3 C4
#1:     1  3  7  5  5  1  2  3  4

也许，您不需要值
列，因此可以通过添加[，value:=NULL][
数据来删除它。表
版本的@Sotos的答案是：
library(data.table)

dcast(melt(setDT(df), 'Value')[, .(Total = sum(Value)), value],
           rowid(value)~value, value.var = 'Total')

#   value A1 A2 B1 B2 C1 C2 C3 C4
#1:     1  3  7  5  5  1  2  3  4

您可能不需要value
列，因此可以通过添加[，value:=NULL][
基本R版本（另一个data.table wannabe）将其删除：

基本R版本（另一个data.table需要）：

这是另一个基本的解决方案
dfout <- t(do.call(rbind,
                   lapply(seq_along(df)[-1], 
                          function(k) unstack(rev(aggregate(Value~.,df[c(1,k)],sum))))))

数据
df <- structure(list(Value = c(1, 2, 3, 4), col1 = structure(c(1L, 
1L, 2L, 2L), .Label = c("A1", "A2"), class = "factor"), col2 = structure(c(1L, 
2L, 2L, 1L), .Label = c("B1", "B2"), class = "factor"), col3 = structure(1:4, .Label = c("C1", 
"C2", "C3", "C4"), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L))

df这里是另一个基本的R解决方案
dfout <- t(do.call(rbind,
                   lapply(seq_along(df)[-1], 
                          function(k) unstack(rev(aggregate(Value~.,df[c(1,k)],sum))))))

数据
df <- structure(list(Value = c(1, 2, 3, 4), col1 = structure(c(1L, 
1L, 2L, 2L), .Label = c("A1", "A2"), class = "factor"), col2 = structure(c(1L, 
2L, 2L, 1L), .Label = c("B1", "B2"), class = "factor"), col3 = structure(1:4, .Label = c("C1", 
"C2", "C3", "C4"), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L))

df这里是另一个选项：
library(data.table)
x <- rbindlist(lapply(paste0("col", 1:3), function(b) df[, sum(Value), b]), 
    use.names=FALSE)

setDT(setNames(as.list(x$V1), x$col1))[]

库（data.table）
x这里是另一个选项：
library(data.table)
x <- rbindlist(lapply(paste0("col", 1:3), function(b) df[, sum(Value), b]), 
    use.names=FALSE)

setDT(setNames(as.list(x$V1), x$col1))[]

库（data.table）
x您也可以按如下方式求解：
library(data.table)
melt(setDT(df), "Value")[, .(TOT = sum(Value)), value][, setNames(as.list(TOT), value)]

#       A1    A2    B1    B2    C1    C2    C3    C4
# 1:     3     7     5     5     1     2     3     4

您还可以按如下方式解决此问题：
library(data.table)
melt(setDT(df), "Value")[, .(TOT = sum(Value)), value][, setNames(as.list(TOT), value)]

#       A1    A2    B1    B2    C1    C2    C3    C4
# 1:     3     7     5     5     1     2     3     4

呵呵……我刚刚在DT中完成了这项工作，但比这项工作更麻烦，所以我不想麻烦添加，也许还有一个更简洁的版本：P但我也使用了比数据表更多的tidyverse
。我想他们将dcast/melt
合并为一个，但我可能会把它与restrape2
混淆在一起……不确定是否有fun.aggregate
参数在dcast
中，但我猜它不能按组求和。呵呵……我刚在DT中完成了这项工作，但比这项工作更麻烦，所以我不想麻烦添加，也许还有一个更简洁的版本：P但我也使用了比数据表更简洁的tidyverse
。我认为它们结合在一起了dcast/melt
合并成一个，但我可能会将其与reforme2
混淆…不确定是否有乐趣。聚合dcast
中的参数，但我猜它将无法按组求和。我喜欢此解决方案，但如果有第四列，col4
，或者通常有更多列，该怎么办？“你将如何修改它？”爱德华问得好！然后，对于一般情况，您可以使用seq_沿着（df）[-1]
而不是2:4
。查看我的更新我喜欢这个解决方案，但是如果有第四列，col4
，或者通常有更多列呢？“你将如何修改它？”爱德华问得好！然后，对于一般情况，您可以使用seq_沿着（df）[-1]
而不是2:4
。查看我的更新