R 获取一列中具有相同值且另一列中具有正二进制值的行数
(为这个奇怪的标题感到抱歉,但我想不出一个简短的方式来表达这一点) 因为我在上一个问题中把我的问题简单化了,所以这次我给大家提供了实际的问题 提供的数据帧包含列“usr”、“usrMsgCnt”和“isRefound”,其中usr为名称,usrMsgCnt为数字,isRefound为二进制 将添加一个新列,其值计算如下: usrMsgCnt/usr等于此行usr的行数 isRefound等于1 对于示例数据的第一行,新值为: 9/5,其中5个由 长度(数据$usr[data$usr==“Jan.Schrader”&数据$isRefound==1]) 考虑到原始数据集的大小,循环执行此操作不是一个选项 这是一小块数据的dputR 获取一列中具有相同值且另一列中具有正二进制值的行数,r,vector,dataframe,R,Vector,Dataframe,(为这个奇怪的标题感到抱歉,但我想不出一个简短的方式来表达这一点) 因为我在上一个问题中把我的问题简单化了,所以这次我给大家提供了实际的问题 提供的数据帧包含列“usr”、“usrMsgCnt”和“isRefound”,其中usr为名称,usrMsgCnt为数字,isRefound为二进制 将添加一个新列,其值计算如下: usrMsgCnt/usr等于此行usr的行数 isRefound等于1 对于示例数据的第一行,新值为: 9/5,其中5个由 长度(数据$usr[data$usr==“Jan.
structure(list(usr = structure(c(21L, 21L, 21L, 21L, 6L, 5L,
6L, 6L, 6L, 21L, 20L, 21L, 6L, 20L, 21L, 21L, 21L, 6L, 6L, 6L
), .Label = c("alsmith", "Amanda.Coles", "Andrew.Coles", "babsimieth",
"Bernd.Ludwig", "Bernhard.Schiemann", "bfueck", "Bram.Ridder",
"brian.tripney", "carlosgardeazabal", "christine.elsweiler",
"cmfinner", "daniel.goncalves", "david", "de56", "eko.ma", "freundlu",
"gmcphail", "ian.ferguson", "Ian.Ruthven", "Jan.Schrader", "jearmour",
"jyang", "Laura.Schnall", "Marc.Roper", "marek.maleika", "Martin.Hacker",
"martin.scholz", "maziminke", "mclanger", "Michael.Cashmore",
"morgan.harvey", "mrussell", "msherrif", "murray.wood", "Nadine.Mahrholz",
"noam.ascher", "pburns", "Peter.Gregory", "raina", "robertnm",
"ronald.teijeira", "ronaldtf", "sbenus", "starmstr", "steve.neely",
"Sven.Friedemann", "tinchen"), class = "factor"), usrMsgCnt = c(9L,
9L, 9L, 9L, 5L, 0L, 5L, 5L, 5L, 9L, 0L, 9L, 5L, 0L, 9L, 9L, 9L,
37L, 37L, 37L), isRefound = c(0L, 1L, 1L, 1L, 1L, 0L, 0L, 1L,
1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L)), .Names = c("usr",
"usrMsgCnt", "isRefound"), row.names = c(NA, 20L), class = "data.frame")
假设
isRefound
实际上是二进制的:
library(data.table)
DT <- data.table(DF,key="usr")
DT[,newvar:=usrMsgCnt/sum(isRefound),by=usr]
库(data.table)
DT也许为了消除任何歧义,您可以发布您希望在这里共享的数据子集的输出。是的,您是对的,给我一分钟+1。我的想法是一样的。我要做的唯一更改是不在data.table creation中添加键以保持原始行顺序(如果需要)。原始行顺序很重要,但我真的不理解“不在data.table creation中添加键”的含义,是否要详细说明?
DT <- data.table(DF)
DT[,id:=.I]
DT[,newvar:=usrMsgCnt/sum(isRefound),by=usr]
print(DT)
# usr usrMsgCnt isRefound id newvar
# 1: Jan.Schrader 9 0 1 1.8
# 2: Jan.Schrader 9 1 2 1.8
# 3: Jan.Schrader 9 1 3 1.8
# 4: Jan.Schrader 9 1 4 1.8
# 5: Bernhard.Schiemann 5 1 5 1.0
# 6: Bernd.Ludwig 0 0 6 NaN
# 7: Bernhard.Schiemann 5 0 7 1.0
# 8: Bernhard.Schiemann 5 1 8 1.0
# 9: Bernhard.Schiemann 5 1 9 1.0
# 10: Jan.Schrader 9 1 10 1.8
# 11: Ian.Ruthven 0 0 11 NaN
# 12: Jan.Schrader 9 0 12 1.8
# 13: Bernhard.Schiemann 5 1 13 1.0
# 14: Ian.Ruthven 0 0 14 NaN
# 15: Jan.Schrader 9 0 15 1.8
# 16: Jan.Schrader 9 0 16 1.8
# 17: Jan.Schrader 9 1 17 1.8
# 18: Bernhard.Schiemann 37 0 18 7.4
# 19: Bernhard.Schiemann 37 1 19 7.4
# 20: Bernhard.Schiemann 37 0 20 7.4
within(DF, {
newvar <- usrMsgCnt/ave(isRefound, usr, FUN = sum)
})
library(plyr)
ddply(DF, .(usr), transform,
newvar = usrMsgCnt/sum(isRefound))