Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/78.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 获取一列中具有相同值且另一列中具有正二进制值的行数_R_Vector_Dataframe - Fatal编程技术网

R 获取一列中具有相同值且另一列中具有正二进制值的行数

R 获取一列中具有相同值且另一列中具有正二进制值的行数,r,vector,dataframe,R,Vector,Dataframe,(为这个奇怪的标题感到抱歉,但我想不出一个简短的方式来表达这一点) 因为我在上一个问题中把我的问题简单化了,所以这次我给大家提供了实际的问题 提供的数据帧包含列“usr”、“usrMsgCnt”和“isRefound”,其中usr为名称,usrMsgCnt为数字,isRefound为二进制 将添加一个新列,其值计算如下: usrMsgCnt/usr等于此行usr的行数 isRefound等于1 对于示例数据的第一行,新值为: 9/5,其中5个由 长度(数据$usr[data$usr==“Jan.

(为这个奇怪的标题感到抱歉,但我想不出一个简短的方式来表达这一点)

因为我在上一个问题中把我的问题简单化了,所以这次我给大家提供了实际的问题

提供的数据帧包含列“usr”、“usrMsgCnt”和“isRefound”,其中usr为名称,usrMsgCnt为数字,isRefound为二进制

将添加一个新列,其值计算如下:

usrMsgCnt/usr等于此行usr的行数 isRefound等于1

对于示例数据的第一行,新值为:

9/5,其中5个由 长度(数据$usr[data$usr==“Jan.Schrader”&数据$isRefound==1])

考虑到原始数据集的大小,循环执行此操作不是一个选项

这是一小块数据的dput

structure(list(usr = structure(c(21L, 21L, 21L, 21L, 6L, 5L, 
6L, 6L, 6L, 21L, 20L, 21L, 6L, 20L, 21L, 21L, 21L, 6L, 6L, 6L
), .Label = c("alsmith", "Amanda.Coles", "Andrew.Coles", "babsimieth", 
"Bernd.Ludwig", "Bernhard.Schiemann", "bfueck", "Bram.Ridder", 
"brian.tripney", "carlosgardeazabal", "christine.elsweiler", 
"cmfinner", "daniel.goncalves", "david", "de56", "eko.ma", "freundlu", 
"gmcphail", "ian.ferguson", "Ian.Ruthven", "Jan.Schrader", "jearmour", 
"jyang", "Laura.Schnall", "Marc.Roper", "marek.maleika", "Martin.Hacker", 
"martin.scholz", "maziminke", "mclanger", "Michael.Cashmore", 
"morgan.harvey", "mrussell", "msherrif", "murray.wood", "Nadine.Mahrholz", 
"noam.ascher", "pburns", "Peter.Gregory", "raina", "robertnm", 
"ronald.teijeira", "ronaldtf", "sbenus", "starmstr", "steve.neely", 
"Sven.Friedemann", "tinchen"), class = "factor"), usrMsgCnt = c(9L, 
9L, 9L, 9L, 5L, 0L, 5L, 5L, 5L, 9L, 0L, 9L, 5L, 0L, 9L, 9L, 9L, 
37L, 37L, 37L), isRefound = c(0L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 
1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L)), .Names = c("usr", 
"usrMsgCnt", "isRefound"), row.names = c(NA, 20L), class = "data.frame")

假设
isRefound
实际上是二进制的:

library(data.table)
DT <- data.table(DF,key="usr")

DT[,newvar:=usrMsgCnt/sum(isRefound),by=usr]
库(data.table)

DT也许为了消除任何歧义,您可以发布您希望在这里共享的数据子集的输出。是的,您是对的,给我一分钟+1。我的想法是一样的。我要做的唯一更改是不在data.table creation中添加键以保持原始行顺序(如果需要)。原始行顺序很重要,但我真的不理解“不在data.table creation中添加键”的含义,是否要详细说明?
DT <- data.table(DF)
DT[,id:=.I]
DT[,newvar:=usrMsgCnt/sum(isRefound),by=usr]
print(DT)

#                    usr usrMsgCnt isRefound id newvar
#  1:       Jan.Schrader         9         0  1    1.8
#  2:       Jan.Schrader         9         1  2    1.8
#  3:       Jan.Schrader         9         1  3    1.8
#  4:       Jan.Schrader         9         1  4    1.8
#  5: Bernhard.Schiemann         5         1  5    1.0
#  6:       Bernd.Ludwig         0         0  6    NaN
#  7: Bernhard.Schiemann         5         0  7    1.0
#  8: Bernhard.Schiemann         5         1  8    1.0
#  9: Bernhard.Schiemann         5         1  9    1.0
# 10:       Jan.Schrader         9         1 10    1.8
# 11:        Ian.Ruthven         0         0 11    NaN
# 12:       Jan.Schrader         9         0 12    1.8
# 13: Bernhard.Schiemann         5         1 13    1.0
# 14:        Ian.Ruthven         0         0 14    NaN
# 15:       Jan.Schrader         9         0 15    1.8
# 16:       Jan.Schrader         9         0 16    1.8
# 17:       Jan.Schrader         9         1 17    1.8
# 18: Bernhard.Schiemann        37         0 18    7.4
# 19: Bernhard.Schiemann        37         1 19    7.4
# 20: Bernhard.Schiemann        37         0 20    7.4
within(DF, {
  newvar <- usrMsgCnt/ave(isRefound, usr, FUN = sum)
})

library(plyr)
ddply(DF, .(usr), transform,
      newvar = usrMsgCnt/sum(isRefound))