关联两列,排序为监视器ID#,返回向量列出R中的关联

关联两列,排序为监视器ID#,返回向量列出R中的关联,r,vector,R,Vector,我对R很陌生,遇到了麻烦。我知道有些人问过这件事,但我正在努力让我的代码正常工作,希望能理解出什么地方出了问题- The prompt is as follows: Write a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between sulfate and nitrate for monitor locatio

我对R很陌生,遇到了麻烦。我知道有些人问过这件事,但我正在努力让我的代码正常工作,希望能理解出什么地方出了问题-

The prompt is as follows: Write a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between sulfate and nitrate for monitor locations where the number of completely observed cases (on all variables) is greater than the threshold. The function should return a vector of correlations for the monitors that meet the threshold requirement. If no monitors meet the threshold requirement, then the function should return a numeric vector of length 0. A prototype of this function follows: 
        corr <- function(directory, threshold = 0) {
            ## 'directory' is a character vector of length 1 indicating the location of
            ## the CSV files

            ## 'threshold' is a numeric vector of length 1 indicating the number of
            ## completely observed observations (on all variables) required to compute
            ## the correlation between nitrate and sulfate; the default is 0

            ## Return a numeric vector of correlations
提示如下:编写一个函数,获取数据文件目录和完整案例的阈值,并计算硫酸盐和硝酸盐之间的相关性,以监测完全观察到的案例数量(在所有变量上)大于阈值的位置。函数应返回满足阈值要求的监控器的相关向量。如果没有监视器满足阈值要求,则函数应返回长度为0的数字向量。此功能的原型如下所示:
corr cr=正常){:
条件的长度大于1,并且只使用第一个元素
>头部(cr)
硝酸盐硫酸盐
硝酸盐1.00000000 0.06243369
硫酸盐0.06243369 1.00000000
对于阈值为150的特定问题,答案应为:
来源(“corr.R”)
来源(“complete.R”)

请在下面找到一个更清晰的代码。我很乐意回答任何问题

corr <- function(directory, threshold = 0) {
  # set the working directory
  setwd(dir = directory)
  # creates vector of filenames within the directory
  spectdata <- list.files(pattern = ".csv") 
  # for each spectdata, read the sulfate and nitrate columns 
  L1 <- lapply(spectdata, function(x) read.csv(x, header = TRUE)[,c("sulfate","nitrate")])
  # for each csv that was read, removes rows that have NA
  L2 <- lapply(L1, function(x) x[complete.cases(x),])
  # removes csv from list if not greater than or are equal to the threshold
  L3 <- Filter(function(x) nrow(x) >= threshold, L2)
  # if the list still has a csv results after Filter (length of list > 0) then:
  if(length(L3) > 0) {
    # for each csv in list, calculate and save correlation between sulfare and nitrate
    Correlation <- lapply(L3, function(x) cor(x[,"sulfate"], x[,"nitrate"]))
    # change list output to a vector output
    unlist(Correlation) 
  } else {
    # return a zero length vector
    numeric(0)
  }
}

corr(directory = "C:/Users/Evan Friedland/Desktop/DIRECTORY", threshold = 100)

corr很难看出发生了什么。首先,当您从
combined中删除rbind时会发生什么情况两个向量之间的相关性将是一个单一的标量值…如果您组合数据帧,您将得到0.0624值。但您期望的实际输出是一个向量…也许您不应该组合csv文件但是,请分别检查每一项的相关性?您是说
read.csv(spectdata[i])
而不是
read.csv[directory[i])
?啊,我明白了-我无法输入任何超过第二条if语句的内容,而不会遇到意外错误}…我认为不通过rbind函数将文件合并到一个大数据帧是有意义的,因为每个文件本身对应一个监视器#(其中有332个)-我认为我们只包括阈值以上的监视器。我不熟悉“if(length)(composed>0)”中包含的语法和函数{return(unlist(lappy(test,function(x)cor(x[,“sulfate”],,x[,“norate”])))“-我还没有看到这些词的大部分~也不应该组合成数据。frame()?谢谢Evan!:)好的,如果每个csv都是您正在测试的监视器,那么使用我的第二个代码块。所以
if(length(combined>0){
只是简单地说,好的,组合是一个列表,一个大于或等于阈值的csv文件的列表。
lappy(x,FUN)
是一个应用另一个函数的函数(FUN)在每个列表中。我知道这种语法有点高级,但我可以提供一个用户定义的函数来娱乐。在这种情况下,我只是说,给我每个列表元素(或csv文件)的
cor(硫酸盐,硝酸盐)
。让我知道这是否有意义。-和
unlist()
just turns list objects->Vector试一试:
L啊-好的,谢谢你的解释。我正在尝试运行你评论的第一组代码(使用lappy),并继续运行括号中的错误-你让它在你的机器上运行了吗/
corr <- function(directory, threshold = 0) {
  # set the working directory
  setwd(dir = directory)
  # creates vector of filenames within the directory
  spectdata <- list.files(pattern = ".csv") 
  # for each spectdata, read the sulfate and nitrate columns 
  L1 <- lapply(spectdata, function(x) read.csv(x, header = TRUE)[,c("sulfate","nitrate")])
  # for each csv that was read, removes rows that have NA
  L2 <- lapply(L1, function(x) x[complete.cases(x),])
  # removes csv from list if not greater than or are equal to the threshold
  L3 <- Filter(function(x) nrow(x) >= threshold, L2)
  # if the list still has a csv results after Filter (length of list > 0) then:
  if(length(L3) > 0) {
    # for each csv in list, calculate and save correlation between sulfare and nitrate
    Correlation <- lapply(L3, function(x) cor(x[,"sulfate"], x[,"nitrate"]))
    # change list output to a vector output
    unlist(Correlation) 
  } else {
    # return a zero length vector
    numeric(0)
  }
}

corr(directory = "C:/Users/Evan Friedland/Desktop/DIRECTORY", threshold = 100)
corr <- function(directory, threshold = 0) {
  setwd(dir = directory)
  spectdata <- list.files(pattern = ".csv") 
  L1 <- lapply(spectdata, function(x) read.csv(x, header = TRUE)[,c("sulfate","nitrate")])
  L2 <- lapply(L1, function(x) x[complete.cases(x),])
  L3 <- Filter(function(x) nrow(x) >= threshold, L2)
  if(length(L3) > 0) {
    Correlation <- lapply(L3, function(x) cor(x[,"sulfate"], x[,"nitrate"]))
    unlist(Correlation) 
  } else {
    numeric(0)
  }
}