R 数据帧内和跨时间重复观测和因子组合的计数函数
假设我有以下类型的数据:R 数据帧内和跨时间重复观测和因子组合的计数函数,r,unique,combinations,apply,R,Unique,Combinations,Apply,假设我有以下类型的数据: df <- data.frame(student = c("S1", "S2", "S3", "S4", "S5", "S2", "S6", "S1", "S7", "S8"), factor = c("A", "A", "A", "A", "A", "B", "B", "C", "C", "D"), year = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2),
df <- data.frame(student = c("S1", "S2", "S3", "S4", "S5", "S2", "S6", "S1", "S7", "S8"),
factor = c("A", "A", "A", "A", "A", "B", "B", "C", "C", "D"),
year = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2),
count1 = c(0, 1, 0, 0, 0, 1, 0, 0, 0, 0),
count2 = c(1, 0, 0, 0, 0, 0, 0, 1, 0, 0))
df这里有一系列命令,可以让你达到目的,使用因素交互作用来查找学生在同一年中的因素变化:
# Add up the occurrences of a student having multiple factors in the same year,
# for each year
in.each.year <- aggregate(factor~student:year, data=df, FUN=function(x) length(x)-1)[c(1,3)]
# Total these up, for each student
in.year <- aggregate(factor~student, data=in.each.year, FUN=sum)
# The name was "factor". Set it to the desired name.
names(in.year)[2] <- 'count1'
# Find the occurrences of a student having multiple factors
both <- aggregate(factor~student, data=df, FUN=function(x) length(x)-1)
names(both)[2] <- 'both'
# Combine with 'merge'
m <- merge(in.year, both)
# Subtract to find "count2"
m$count2 <- m$both - m$count1
m$both <- NULL
m
## student count1 count2
## 1 S1 0 1
## 2 S2 1 0
## 3 S3 0 0
## 4 S4 0 0
## 5 S5 0 0
## 6 S6 0 0
## 7 S7 0 0
## 8 S8 0 0
没有样本数据很难理解这个问题。请在这里为好的人添加可复制的样品以帮助您。请参见编辑以显示模拟数据的代码。两条注释。1(有点小):在apply
变得过于昂贵之前,10K的观测值与您需要的相差甚远。2(有点大):你想要什么还不完全清楚。更改您的示例数据,以便某些学生实际获得0分,并给出示例所需的结果。请参阅上面添加到dataframe/示例代码中的其他计数列。
merge(df, m)
## student factor year count1 count2
## 1 S1 A 1 0 1
## 2 S1 C 2 0 1
## 3 S2 A 1 1 0
## 4 S2 B 1 1 0
## 5 S3 A 1 0 0
## 6 S4 A 1 0 0
## 7 S5 A 1 0 0
## 8 S6 B 1 0 0
## 9 S7 C 2 0 0
## 10 S8 D 2 0 0