Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/71.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 数据帧内和跨时间重复观测和因子组合的计数函数_R_Unique_Combinations_Apply - Fatal编程技术网

R 数据帧内和跨时间重复观测和因子组合的计数函数

R 数据帧内和跨时间重复观测和因子组合的计数函数,r,unique,combinations,apply,R,Unique,Combinations,Apply,假设我有以下类型的数据: df <- data.frame(student = c("S1", "S2", "S3", "S4", "S5", "S2", "S6", "S1", "S7", "S8"), factor = c("A", "A", "A", "A", "A", "B", "B", "C", "C", "D"), year = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2),

假设我有以下类型的数据:

df <- data.frame(student = c("S1", "S2", "S3", "S4", "S5", "S2", "S6", "S1", "S7", "S8"), 
              factor = c("A", "A", "A", "A", "A", "B", "B", "C", "C", "D"), 
              year =  c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2), 
              count1 = c(0, 1, 0, 0, 0, 1, 0, 0, 0, 0), 
              count2 = c(1, 0, 0, 0, 0, 0, 0, 1, 0, 0))

df这里有一系列命令,可以让你达到目的,使用因素交互作用来查找学生在同一年中的因素变化:

# Add up the occurrences of a student having multiple factors in the same year,
# for each year
in.each.year <- aggregate(factor~student:year, data=df, FUN=function(x) length(x)-1)[c(1,3)]

# Total these up, for each student
in.year <- aggregate(factor~student, data=in.each.year, FUN=sum)

# The name was "factor".  Set it to the desired name.
names(in.year)[2] <- 'count1'

# Find the occurrences of a student having multiple factors
both <- aggregate(factor~student, data=df, FUN=function(x) length(x)-1)
names(both)[2] <- 'both'

# Combine with 'merge'
m <- merge(in.year, both)

# Subtract to find "count2"
m$count2 <- m$both - m$count1
m$both <- NULL

m
##   student count1 count2
## 1      S1      0      1
## 2      S2      1      0
## 3      S3      0      0
## 4      S4      0      0
## 5      S5      0      0
## 6      S6      0      0
## 7      S7      0      0
## 8      S8      0      0

没有样本数据很难理解这个问题。请在这里为好的人添加可复制的样品以帮助您。请参见编辑以显示模拟数据的代码。两条注释。1(有点小):在
apply
变得过于昂贵之前,10K的观测值与您需要的相差甚远。2(有点大):你想要什么还不完全清楚。更改您的示例数据,以便某些学生实际获得0分,并给出示例所需的结果。请参阅上面添加到dataframe/示例代码中的其他计数列。
merge(df, m)
##    student factor year count1 count2
## 1       S1      A    1      0      1
## 2       S1      C    2      0      1
## 3       S2      A    1      1      0
## 4       S2      B    1      1      0
## 5       S3      A    1      0      0
## 6       S4      A    1      0      0
## 7       S5      A    1      0      0
## 8       S6      B    1      0      0
## 9       S7      C    2      0      0
## 10      S8      D    2      0      0