R-跨多列计算指标（如Excel中的sumproduct）_R

R-跨多列计算指标（如Excel中的sumproduct）

R-跨多列计算指标（如Excel中的sumproduct）,r,R,我在R中有以下数据帧： df <- data.frame(id=c('a','b','a','c','b','a'), indicator1=c(1,0,0,0,1,1), indicator2=c(0,0,0,1,0,1), extra1=c(4,5,12,4,3,7), extra2=c('z','z','x','y','x','x')) id in

我在R中有以下数据帧：

df <- data.frame(id=c('a','b','a','c','b','a'),
                 indicator1=c(1,0,0,0,1,1),
                 indicator2=c(0,0,0,1,0,1),
                 extra1=c(4,5,12,4,3,7),
                 extra2=c('z','z','x','y','x','x'))

id indicator1 indicator2 extra1 extra2
a          1          0      4      z
b          0          0      5      z
a          0          0     12      x
c          0          1      4      y
b          1          0      3      x
a          1          1      7      x

我该怎么做呢？

有几种方法。这里有一个带有

ave

和

的
within(df, {
  ind1ind2 <- ave(as.character(interaction(indicator1, indicator2, drop=TRUE)), 
                  id, FUN = function(x) sum(x == "1.1"))
  ind2 <- ave(indicator2, id, FUN = function(x) sum(x == 1))
  ind1 <- ave(indicator1, id, FUN = function(x) sum(x == 1))
})
#   id indicator1 indicator2 extra1 extra2 ind1 ind2 ind1ind2
# 1  a          1          0      4      z    2    1        1
# 2  b          0          0      5      z    1    0        0
# 3  a          0          0     12      x    2    1        1
# 4  c          0          1      4      y    0    1        0
# 5  b          1          0      3      x    1    0        0
# 6  a          1          1      7      x    2    1        1

而且，如果您觉得更明确，您也可以执行sum（indicator1==1&indicator2==1）
，而不是sum（interaction（…）=“1.1”）
。我还没有进行基准测试，看哪一个更有效<代码>交互

正是我第一次想到的。

有几种方法。这里有一个带有

ave

和

的
within(df, {
  ind1ind2 <- ave(as.character(interaction(indicator1, indicator2, drop=TRUE)), 
                  id, FUN = function(x) sum(x == "1.1"))
  ind2 <- ave(indicator2, id, FUN = function(x) sum(x == 1))
  ind1 <- ave(indicator1, id, FUN = function(x) sum(x == 1))
})
#   id indicator1 indicator2 extra1 extra2 ind1 ind2 ind1ind2
# 1  a          1          0      4      z    2    1        1
# 2  b          0          0      5      z    1    0        0
# 3  a          0          0     12      x    2    1        1
# 4  c          0          1      4      y    0    1        0
# 5  b          1          0      3      x    1    0        0
# 6  a          1          1      7      x    2    1        1

而且，如果您觉得更明确，您也可以执行sum（indicator1==1&indicator2==1）
，而不是sum（interaction（…）=“1.1”）
。我还没有进行基准测试，看哪一个更有效<代码>交互

正是我第一次想到的。

或者你可以这样做：

get_freq1 = function(i) {sum(df[which(df$id == df[i,1]),]$indicator1)}
get_freq2 = function(i) {sum(df[which(df$id == df[i,1]),]$indicator2)} 

df = data.frame(df, countInd1 = sapply(1:nrow(df), get_freq1), countInd2 = sapply(1:nrow(df), get_freq2))
df= data.frame(df, countInd1Ind2 = ((df$countInd1 != 0) & (df$countInd2 != 0))*1)

你会得到：

 #  id indicator1 indicator2 extra1 extra2 countInd1 countInd2 countInd1Ind2
 #1  a          1          0      4      z         2         1             1
 #2  b          0          0      5      z         1         0             0
 #3  a          0          0     12      x         2         1             1
 #4  c          0          1      4      y         0         1             0
 #5  b          1          0      3      x         1         0             0
 #6  a          1          1      7      x         2         1             1

或者你可以这样做：

get_freq1 = function(i) {sum(df[which(df$id == df[i,1]),]$indicator1)}
get_freq2 = function(i) {sum(df[which(df$id == df[i,1]),]$indicator2)} 

df = data.frame(df, countInd1 = sapply(1:nrow(df), get_freq1), countInd2 = sapply(1:nrow(df), get_freq2))
df= data.frame(df, countInd1Ind2 = ((df$countInd1 != 0) & (df$countInd2 != 0))*1)

你会得到：

 #  id indicator1 indicator2 extra1 extra2 countInd1 countInd2 countInd1Ind2
 #1  a          1          0      4      z         2         1             1
 #2  b          0          0      5      z         1         0             0
 #3  a          0          0     12      x         2         1             1
 #4  c          0          1      4      y         0         1             0
 #5  b          1          0      3      x         1         0             0
 #6  a          1          1      7      x         2         1             1

如果

indicator1

和

indicator2

是（看起来和似乎被命名的）指标，

DT[，c（'ind1'，'ind2'，'ind1ind2'）：=list（sum（indicator1），sum（indicator2），sum（（indicator1+indicator2）>1）），by=id]

应该起作用（而且效率更高）@mnel，这是我想到的，但是我真的不想对“indicator1”和“indicator2”列中是否存在其他值做出任何假设。如果

indicator1

和

indicator2

是（看起来和被命名的）指标，

DT[，c（'ind1'，'ind2'，'ind1ind2'）：=list（sum（indicator1），sum（indicator2），sum(（indicator1+indicator2）>1），by=id]

应该可以工作（而且效率更高）@mnel，这是我想到的，但我并不想对“indicator1”和“indicator2”列中是否有其他值做出任何假设。