R 获取一列中具有相同值且另一列中具有正二进制值的行数_R_Vector_Dataframe

R 获取一列中具有相同值且另一列中具有正二进制值的行数

r vector dataframe

R 获取一列中具有相同值且另一列中具有正二进制值的行数,r,vector,dataframe,R,Vector,Dataframe,（为这个奇怪的标题感到抱歉，但我想不出一个简短的方式来表达这一点）因为我在上一个问题中把我的问题简单化了，所以这次我给大家提供了实际的问题提供的数据帧包含列“usr”、“usrMsgCnt”和“isRefound”，其中usr为名称，usrMsgCnt为数字，isRefound为二进制将添加一个新列，其值计算如下： usrMsgCnt/usr等于此行usr的行数 isRefound等于1 对于示例数据的第一行，新值为： 9/5，其中5个由长度（数据$usr[data$usr==“Jan.

（为这个奇怪的标题感到抱歉，但我想不出一个简短的方式来表达这一点）

因为我在上一个问题中把我的问题简单化了，所以这次我给大家提供了实际的问题

提供的数据帧包含列“usr”、“usrMsgCnt”和“isRefound”，其中usr为名称，usrMsgCnt为数字，isRefound为二进制

将添加一个新列，其值计算如下：

usrMsgCnt/usr等于此行usr的行数 isRefound等于1

对于示例数据的第一行，新值为：

9/5，其中5个由长度（数据$usr[data$usr==“Jan.Schrader”&数据$isRefound==1]）

考虑到原始数据集的大小，循环执行此操作不是一个选项

这是一小块数据的dput

structure(list(usr = structure(c(21L, 21L, 21L, 21L, 6L, 5L, 
6L, 6L, 6L, 21L, 20L, 21L, 6L, 20L, 21L, 21L, 21L, 6L, 6L, 6L
), .Label = c("alsmith", "Amanda.Coles", "Andrew.Coles", "babsimieth", 
"Bernd.Ludwig", "Bernhard.Schiemann", "bfueck", "Bram.Ridder", 
"brian.tripney", "carlosgardeazabal", "christine.elsweiler", 
"cmfinner", "daniel.goncalves", "david", "de56", "eko.ma", "freundlu", 
"gmcphail", "ian.ferguson", "Ian.Ruthven", "Jan.Schrader", "jearmour", 
"jyang", "Laura.Schnall", "Marc.Roper", "marek.maleika", "Martin.Hacker", 
"martin.scholz", "maziminke", "mclanger", "Michael.Cashmore", 
"morgan.harvey", "mrussell", "msherrif", "murray.wood", "Nadine.Mahrholz", 
"noam.ascher", "pburns", "Peter.Gregory", "raina", "robertnm", 
"ronald.teijeira", "ronaldtf", "sbenus", "starmstr", "steve.neely", 
"Sven.Friedemann", "tinchen"), class = "factor"), usrMsgCnt = c(9L, 
9L, 9L, 9L, 5L, 0L, 5L, 5L, 5L, 9L, 0L, 9L, 5L, 0L, 9L, 9L, 9L, 
37L, 37L, 37L), isRefound = c(0L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 
1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L)), .Names = c("usr", 
"usrMsgCnt", "isRefound"), row.names = c(NA, 20L), class = "data.frame")

假设

isRefound

实际上是二进制的：

library(data.table)
DT <- data.table(DF,key="usr")

DT[,newvar:=usrMsgCnt/sum(isRefound),by=usr]

库（data.table）
DT也许为了消除任何歧义，您可以发布您希望在这里共享的数据子集的输出。是的，您是对的，给我一分钟+1。我的想法是一样的。我要做的唯一更改是不在data.table creation中添加键以保持原始行顺序（如果需要）。原始行顺序很重要，但我真的不理解“不在data.table creation中添加键”的含义，是否要详细说明？
DT <- data.table(DF)
DT[,id:=.I]
DT[,newvar:=usrMsgCnt/sum(isRefound),by=usr]
print(DT)

#                    usr usrMsgCnt isRefound id newvar
#  1:       Jan.Schrader         9         0  1    1.8
#  2:       Jan.Schrader         9         1  2    1.8
#  3:       Jan.Schrader         9         1  3    1.8
#  4:       Jan.Schrader         9         1  4    1.8
#  5: Bernhard.Schiemann         5         1  5    1.0
#  6:       Bernd.Ludwig         0         0  6    NaN
#  7: Bernhard.Schiemann         5         0  7    1.0
#  8: Bernhard.Schiemann         5         1  8    1.0
#  9: Bernhard.Schiemann         5         1  9    1.0
# 10:       Jan.Schrader         9         1 10    1.8
# 11:        Ian.Ruthven         0         0 11    NaN
# 12:       Jan.Schrader         9         0 12    1.8
# 13: Bernhard.Schiemann         5         1 13    1.0
# 14:        Ian.Ruthven         0         0 14    NaN
# 15:       Jan.Schrader         9         0 15    1.8
# 16:       Jan.Schrader         9         0 16    1.8
# 17:       Jan.Schrader         9         1 17    1.8
# 18: Bernhard.Schiemann        37         0 18    7.4
# 19: Bernhard.Schiemann        37         1 19    7.4
# 20: Bernhard.Schiemann        37         0 20    7.4

within(DF, {
  newvar <- usrMsgCnt/ave(isRefound, usr, FUN = sum)
})

library(plyr)
ddply(DF, .(usr), transform,
      newvar = usrMsgCnt/sum(isRefound))