Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/83.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/email/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在R中,如何使用第二个元数据文件将标签值分解并添加到特定的data.table列中?_R_Loops_Csv_Label - Fatal编程技术网

在R中,如何使用第二个元数据文件将标签值分解并添加到特定的data.table列中?

在R中,如何使用第二个元数据文件将标签值分解并添加到特定的data.table列中?,r,loops,csv,label,R,Loops,Csv,Label,这是从SPSS切换到R的项目的一部分。虽然有很好的工具可以将SPSS文件导入R expss,但这个问题的一部分是,当数据来自CSV源时,试图获得SPSS样式标记的好处。这有助于弥补SPSS和R之间的员工培训差距,方法是为data.tables提供通用格式,而不考虑文件格式的来源 虽然CSV在存储数据方面做得很好,但它无法提供有意义的数据。这不可避免地意味着变量和因子水平以及标签必须来自其他地方。在大多数简短的例子中,例如在文档中,简单地硬编码元数据是可行的。但对于较大的项目,将元数据存储在第二个

这是从SPSS切换到R的项目的一部分。虽然有很好的工具可以将SPSS文件导入R expss,但这个问题的一部分是,当数据来自CSV源时,试图获得SPSS样式标记的好处。这有助于弥补SPSS和R之间的员工培训差距,方法是为data.tables提供通用格式,而不考虑文件格式的来源

虽然CSV在存储数据方面做得很好,但它无法提供有意义的数据。这不可避免地意味着变量和因子水平以及标签必须来自其他地方。在大多数简短的例子中,例如在文档中,简单地硬编码元数据是可行的。但对于较大的项目,将元数据存储在第二个csv文件中更有意义

示例数据文件

ID,varone,vartwo,varthree,varfour,varfive,varsix,varseven,vareight,varnine,varten 1,1,34,1,,1,,1,1,4, 2,1,21,0,1,,1,3,14,3,2 3,1,54,1,,,1,3,6,4,4 4,2,32,1,1,1,,3,7,4, 5,3,66,0,,,1,3,9,3,3 6,2,43,1,,1,,1,12,2,1 7,2,26,0,,,1,2,11,1, 8,3,,1,1,,,2,15,1,4 9,1,34,1,,1,,1,12,3,4 10,2,46,0,,,,3,13,2, 11,3,39,1,1,1,,3,7,1,2 12,1,28,0,,,1,1,6,5,1 13,2,64,0,,1,,2,11,,3 14,3,34,1,1,,,3,10,1,1 15,1,52,1,1,1,1,1,8,6

示例元数据文件

行标签,ID,varone,vartwo,varthree,varfour,varfive,varsix,varseven,vareight,varnine,varten varlabel,问题一,问题二,问题三,问题四,问题五,问题六,问题七,问题八,问题九,问题十 变量角色,唯一,态度,唯一,过滤器,过滤器,过滤器,过滤器,态度,过滤器,态度,态度,态度 缺失,错误,错误,忽略,错误,未选中,未选中,未选中,错误,错误,错误,忽略 有价,一,无,已检查,已检查,已检查,x,一,A,支架 有价的,两个,是的,y,二,B,中性 有价,三,z,三,C,反对 我不知道 有价的,有价的,有价的, 可估价的,可估价的,可估价的, 可估价的,可估价的,可估价的, 有价的,有价的,有价的,, 有价的,有价的,有价的,, 可估价的,可估价的,可估价的,, 有价的,有价的,, 可估价的,可估价的,可估价的,, 可估价的,可估价的,可估价的,, 可估价的,可估价的,, 可估价的,可估价的,可估价的

因此,公共元素是列名,它是两个文件的键

元数据文件的第一列描述数据文件行的角色 所以 varlabel为每列提供变量标签 varrole描述变量的分析目的 missing描述如何处理丢失的数据 varlabel描述一个因子级别的标签,从一个到多个标签

对!!以下是有效的代码:

```#Libraries
library(expss)
library(data.table)
library(magrittr)```

看来问题就在这条线上
readcsvdata <- function(dfile)
{
# TESTED - Working
print("OK Lets read some comma separated values")
rdata <- fread(file = dfile, sep = "," , quote = "\"" , header = TRUE, stringsAsFactors = FALSE, 
na.strings = getOption("datatable.na.strings",""))
return(rdata)
}
rawdatafilename <- "testdata.csv"
rawmetadata <- "metadata.csv"

mdt <- readcsvdata(rawmetadata)
rdt <- readcsvdata(rawdatafilename)
names(rdt)[names(rdt) == "ï..ID"] <- "ID" # correct minor data error
commonnames <- intersect(names(mdt),names(rdt))  # find common variable names so metadata applies 
commonnames <- commonnames[-(1)] # remove ID
qlabels <- as.list(mdt[1, commonnames, with = FALSE])
# set var names to columns
for (each_name in commonnames) # loop through commonnames and qlabels
{  
    expss::var_lab(tdt[[each_name]]) <- qlabels[[each_name]]
}
factorcols <- as.vector(commonnames)  # create a vector of column names (for later use)
for (col in factorcols) 
  { 
  print( is.na(mdt[4, ..col])) # print first row of value labels (as test)
  if (is.na(mdt[4, ..col])) factorcols <- factorcols[factorcols != col] 
# if not a factor column, remove it from the factorcol list and dont try to factor it
  else {  # if it is a vector factorise
    print(paste("working on",col))  # I have had a lot of problem with unrecognised ..col variables
    tlabels <- as.vector(na.omit(mdt[4:18, ..col]))  # get list of labels from the data column}
    validrange <- seq(1,lengths(tlabels),1)            # range of valid values is 1 to the length of labels list
    print(as.character(tlabels)) # for testing
    print(validrange) # for testing
    tdt[[col]] <- factor(tdt[[col]], levels = validrange, ordered = is.ordered(validrange), labels = as.character(tlabels))
    # expss::val_lab(tdt[, ..col]) <- tlabels
    tlabels = c()  # flush loop variable
    validrange = c() # flush loop variable
  }
}
# test using column name
tlabels <- c("one","two","three")
validrange <- c(1,2,3)
factor(tdt[,varone], levels = validrange, ordered=is.ordered(validrange), labels = tlabels)
library(data.table)
mdt = as.data.table(mtcars)
col = "am"
tlabels <- as.vector(na.omit(mdt[3:6, ..col])) # ! tlabels is data.table 
str(tlabels)
# Classes ‘data.table’ and 'data.frame':    4 obs. of  1 variable:
#     $ am: num  1 0 0 0
# - attr(*, ".internal.selfref")=<externalptr> 

as.character(tlabels) # character vector of length 1
# [1] "c(1, 0, 0, 0)"

tlabels <- na.omit(mdt[[col]][3:6]) # vector 
str(tlabels)
# num [1:4] 1 0 0 0
as.character(tlabels) # character vector of length 4
# [1] "1" "0" "0" "0"