R-跨多列计算指标(如Excel中的sumproduct)
我在R中有以下数据帧:R-跨多列计算指标(如Excel中的sumproduct),r,R,我在R中有以下数据帧: df <- data.frame(id=c('a','b','a','c','b','a'), indicator1=c(1,0,0,0,1,1), indicator2=c(0,0,0,1,0,1), extra1=c(4,5,12,4,3,7), extra2=c('z','z','x','y','x','x')) id in
df <- data.frame(id=c('a','b','a','c','b','a'),
indicator1=c(1,0,0,0,1,1),
indicator2=c(0,0,0,1,0,1),
extra1=c(4,5,12,4,3,7),
extra2=c('z','z','x','y','x','x'))
id indicator1 indicator2 extra1 extra2
a 1 0 4 z
b 0 0 5 z
a 0 0 12 x
c 0 1 4 y
b 1 0 3 x
a 1 1 7 x
我该怎么做呢?有几种方法。这里有一个带有
ave
和的
within(df, {
ind1ind2 <- ave(as.character(interaction(indicator1, indicator2, drop=TRUE)),
id, FUN = function(x) sum(x == "1.1"))
ind2 <- ave(indicator2, id, FUN = function(x) sum(x == 1))
ind1 <- ave(indicator1, id, FUN = function(x) sum(x == 1))
})
# id indicator1 indicator2 extra1 extra2 ind1 ind2 ind1ind2
# 1 a 1 0 4 z 2 1 1
# 2 b 0 0 5 z 1 0 0
# 3 a 0 0 12 x 2 1 1
# 4 c 0 1 4 y 0 1 0
# 5 b 1 0 3 x 1 0 0
# 6 a 1 1 7 x 2 1 1
而且,如果您觉得更明确,您也可以执行sum(indicator1==1&indicator2==1)
,而不是sum(interaction(…)=“1.1”)
。我还没有进行基准测试,看哪一个更有效<代码>交互
正是我第一次想到的。有几种方法。这里有一个带有ave
和的
within(df, {
ind1ind2 <- ave(as.character(interaction(indicator1, indicator2, drop=TRUE)),
id, FUN = function(x) sum(x == "1.1"))
ind2 <- ave(indicator2, id, FUN = function(x) sum(x == 1))
ind1 <- ave(indicator1, id, FUN = function(x) sum(x == 1))
})
# id indicator1 indicator2 extra1 extra2 ind1 ind2 ind1ind2
# 1 a 1 0 4 z 2 1 1
# 2 b 0 0 5 z 1 0 0
# 3 a 0 0 12 x 2 1 1
# 4 c 0 1 4 y 0 1 0
# 5 b 1 0 3 x 1 0 0
# 6 a 1 1 7 x 2 1 1
而且,如果您觉得更明确,您也可以执行sum(indicator1==1&indicator2==1)
,而不是sum(interaction(…)=“1.1”)
。我还没有进行基准测试,看哪一个更有效<代码>交互
正是我第一次想到的。或者你可以这样做:
get_freq1 = function(i) {sum(df[which(df$id == df[i,1]),]$indicator1)}
get_freq2 = function(i) {sum(df[which(df$id == df[i,1]),]$indicator2)}
df = data.frame(df, countInd1 = sapply(1:nrow(df), get_freq1), countInd2 = sapply(1:nrow(df), get_freq2))
df= data.frame(df, countInd1Ind2 = ((df$countInd1 != 0) & (df$countInd2 != 0))*1)
你会得到:
# id indicator1 indicator2 extra1 extra2 countInd1 countInd2 countInd1Ind2
#1 a 1 0 4 z 2 1 1
#2 b 0 0 5 z 1 0 0
#3 a 0 0 12 x 2 1 1
#4 c 0 1 4 y 0 1 0
#5 b 1 0 3 x 1 0 0
#6 a 1 1 7 x 2 1 1
或者你可以这样做:
get_freq1 = function(i) {sum(df[which(df$id == df[i,1]),]$indicator1)}
get_freq2 = function(i) {sum(df[which(df$id == df[i,1]),]$indicator2)}
df = data.frame(df, countInd1 = sapply(1:nrow(df), get_freq1), countInd2 = sapply(1:nrow(df), get_freq2))
df= data.frame(df, countInd1Ind2 = ((df$countInd1 != 0) & (df$countInd2 != 0))*1)
你会得到:
# id indicator1 indicator2 extra1 extra2 countInd1 countInd2 countInd1Ind2
#1 a 1 0 4 z 2 1 1
#2 b 0 0 5 z 1 0 0
#3 a 0 0 12 x 2 1 1
#4 c 0 1 4 y 0 1 0
#5 b 1 0 3 x 1 0 0
#6 a 1 1 7 x 2 1 1
如果
indicator1
和indicator2
是(看起来和似乎被命名的)指标,DT[,c('ind1','ind2','ind1ind2'):=list(sum(indicator1),sum(indicator2),sum((indicator1+indicator2)>1)),by=id]
应该起作用(而且效率更高)@mnel,这是我想到的,但是我真的不想对“indicator1”和“indicator2”列中是否存在其他值做出任何假设。如果indicator1
和indicator2
是(看起来和被命名的)指标,DT[,c('ind1','ind2','ind1ind2'):=list(sum(indicator1),sum(indicator2),sum((indicator1+indicator2)>1),by=id]
应该可以工作(而且效率更高)@mnel,这是我想到的,但我并不想对“indicator1”和“indicator2”列中是否有其他值做出任何假设。