R 初学者重新设置csv文件中的数据范围_R_Csv

R 初学者重新设置csv文件中的数据范围

r csv

R 初学者重新设置csv文件中的数据范围,r,csv,R,Csv,这是非常基本的，但是我被过于复杂的代码困住了。我有一个CSV文件，包含一列测试、一列分数和一列学生。我想重新格式化数据，这样我就有了学生分数的行和测试的列我创建了一个单独的csv，其中包含名为“students.csv”的学生（作为数字代码），因为现在这样做比较容易我有52个学生和50个测试我可以让以下内容与单个学生一起工作： matricNumbers <- read.csv("students.csv") students <- as.vector(as.matrix(ma

这是非常基本的，但是我被过于复杂的代码困住了。我有一个CSV文件，包含一列测试、一列分数和一列学生。我想重新格式化数据，这样我就有了学生分数的行和测试的列

我创建了一个单独的csv，其中包含名为“students.csv”的学生（作为数字代码），因为现在这样做比较容易

我有52个学生和50个测试

我可以让以下内容与单个学生一起工作：

matricNumbers <- read.csv("students.csv")
students <- as.vector(as.matrix(matricNumbers))
students
data <- read.csv("marks.csv")
studentSubset <- data[data[2] == 1150761,] 
marksSubset <- as.vector(as.matrix(studentSubset[5]))
ll <- list()
ll<-c(list(marksSubset), ll)
dd<-data.frame(matrix(nrow=50,ncol=50))
for(i in 1:length(ll)){
  dd[i,] <- ll[[i]]

}
dd

我得到一个错误：

Error in `[<-.data.frame`(`*tmp*`, i, , value = logical(0)) : replacement has 0 items, need 50

“[中的

错误如果我正确理解问题，您可以使用重塑
包来实现您想要的。由于您没有提供样本数据，因此很难进行测试。为此，我建议您将dput（head（matricNumbers））
的输出粘贴到上面的代码块中
但是，您应该能够遵循我使用的一些虚拟数据的简单示例。我认为您可能只需要一行，并且您可以忘记所有复杂的循环内容
# These lines make some dummy data, similar to you matricNumbers (hopefully)
test = sort(sample(c("Biology","Maths","Chemistry") , 10 , repl = TRUE ))
students = unlist( sapply(table(test), function(x) { sample( letters[1:x] , x ) } ) )
names(students) <- NULL
scores <- data.frame( test , mark = sample( 40:100 , 10 , repl = TRUE ) , students )
scores
        test mark students
1    Biology   50        c
2    Biology   93        a
3    Biology   83        b
4    Biology   83        d
5  Chemistry   71        b
6  Chemistry   54        c
7  Chemistry   54        a
8      Maths   97        c
9      Maths   93        b
10     Maths   72        a



# Then use reshape to cast your data into the format you require
# I use 'mean' as the aggregation function. If you have one score for each student/test, then mean will just return the score
# If you do not have a score for a particular student in that test then it will return NaN
require( reshape )
bystudent <- cast( scores , students ~ test , value = "mark" , mean )
bystudent
  students Biology Chemistry Maths
1        a      93        54    72
2        b      83        71    93
3        c      50        54    97
4        d      83       NaN   NaN

#这些行生成一些虚拟数据，类似于您的矩阵编号（希望如此）
测试=排序（样本（c（“生物学”、“数学”、“化学”），10，repl=TRUE））
students=unlist（sapply（表（测试），函数（x）{sample（字母[1:x]，x]））
命名（学生）i
停止时的值是多少？这应该是导致错误的原因。您能显示该子集吗？另外，您是否尝试过将嵌套循环中的i
替换为j以清晰明了？很好，这非常简单！谢谢！
# These lines make some dummy data, similar to you matricNumbers (hopefully)
test = sort(sample(c("Biology","Maths","Chemistry") , 10 , repl = TRUE ))
students = unlist( sapply(table(test), function(x) { sample( letters[1:x] , x ) } ) )
names(students) <- NULL
scores <- data.frame( test , mark = sample( 40:100 , 10 , repl = TRUE ) , students )
scores
        test mark students
1    Biology   50        c
2    Biology   93        a
3    Biology   83        b
4    Biology   83        d
5  Chemistry   71        b
6  Chemistry   54        c
7  Chemistry   54        a
8      Maths   97        c
9      Maths   93        b
10     Maths   72        a



# Then use reshape to cast your data into the format you require
# I use 'mean' as the aggregation function. If you have one score for each student/test, then mean will just return the score
# If you do not have a score for a particular student in that test then it will return NaN
require( reshape )
bystudent <- cast( scores , students ~ test , value = "mark" , mean )
bystudent
  students Biology Chemistry Maths
1        a      93        54    72
2        b      83        71    93
3        c      50        54    97
4        d      83       NaN   NaN