Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/70.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在R中将不整洁的测量表转换为数据帧_R_Tidyr_Data Cleaning_Readxl - Fatal编程技术网

如何在R中将不整洁的测量表转换为数据帧

如何在R中将不整洁的测量表转换为数据帧,r,tidyr,data-cleaning,readxl,R,Tidyr,Data Cleaning,Readxl,我经常在工作中处理格式糟糕的excel文件中的调查数据,这些文件是为可读性而设计的,不用于任何数据分析。我正在寻找一种方法来清理R中的数据,并将其转换为变量和观察值的数据帧格式 我知道在R中有大量关于数据清理的教程,但根据我的经验,它们主要处理已经是机器可读格式的数据,因此对此的任何帮助都将不胜感激 以下是具有此形状的原始调查的虚拟示例: Are you male or female? Variable1 Variable2 Variable3 Variable4 Mal

我经常在工作中处理格式糟糕的excel文件中的调查数据,这些文件是为可读性而设计的,不用于任何数据分析。我正在寻找一种方法来清理R中的数据,并将其转换为变量和观察值的数据帧格式

我知道在R中有大量关于数据清理的教程,但根据我的经验,它们主要处理已经是机器可读格式的数据,因此对此的任何帮助都将不胜感激

以下是具有此形状的原始调查的虚拟示例:

Are you male or female?

           Variable1 Variable2 Variable3 Variable4
Male       n%        n%        n%        n%
Female     n%        n%        n%        n%


How old are you?

           Variable1 Variable2 Variable3 Variable4
18-34      n%        n%        n%        n%
35+        n%        n%        n%        n%
以此类推,空白区域为空单元格/行,每个调查问题的整体位于其相应数据表上方的两行A列中,所有问题/数据表位于一张工作表中

有没有办法用R码转换成这个

Question                Response Variable1 Variable2 Variable3 Variable4
Are you male or female? Male     n%        n%        n%        n%
Are you male or female? Female   n%        n%        n%        n%
How old are you?        18-34    n%        n%        n%        n%
How old are you?        35+      n%        n%        n%        n%
目前,我正在使用一些VBA代码在excel中执行此操作,然后将其读入R以进行进一步的分析/可视化,但如果能够跳过excel阶段,直接转到R,那就太好了


谢谢

这里有一个粗略的方法来处理整理不良的数据。我以csv格式制作了一个,并将其托管在一个杂项回购协议上:

file <- "https://raw.githubusercontent.com/minerva79/woodpecker/master/data/example.csv"
survey <- readLines(file)
(5) 将每个列表对象读取为表:

tabs <- lapply(tabs, function(x)read.table(text=x, sep=",", header=T, row.names=1))
tabs

[[1]]
       Variable1 Variable2 Variable3 Variable4
Male         0.5       0.6       0.7       0.8
Female       0.5       0.4       0.3       0.2

[[2]]
      Variable1 Variable2 Variable3 Variable4
18-34       0.4       0.5       0.7       0.1
35+         0.6       0.5       0.3       0.9

选项卡查看新的
readxl
cld帮助。感谢各位,我将继续阅读文档。对不起,Adam,我应该更清楚的是,“Q1”和“response1”实际上是问题和回答的文本字符串,所以我需要找到一种方法,抓住每个问题字符串,并将其作为一个新变量与回答一起填写。我编辑了我的帖子来反映这一点。谢谢你的帮助。很好,谢谢亚当。我猜我必须先将xlsx文件转换为csv,然后才能用
readLines
以“字符”的形式读入并以那种格式使用它,但我会尝试一下。再次感谢。是的。试一试,如果你遇到任何问题,请告诉我。
headers <- substring(survey, 1,1) == ","
survey[headers]

[1] ",Variable1,Variable2,Variable3,Variable4" ",Variable1,Variable2,Variable3,Variable4"
header_pos <- (1:length(survey))[headers]
qn_pos <- header_pos - 1 

qn <- survey[qn_pos] %>% gsub(",", "", .)
qn

[1] "Are you male or female?" "How old are you?" 
qn_pos <- c(qn_pos - 1, length(survey))
tabs <- lapply(1:length(qn), function(x)survey[header_pos[x]:qn_pos[x+1]])
tabs

[[1]]
[1] ",Variable1,Variable2,Variable3,Variable4" "Male,0.5,0.6,0.7,0.8"                     "Female,0.5,0.4,0.3,0.2"                  

[[2]]
[1] ",Variable1,Variable2,Variable3,Variable4" "18-34,0.4,0.5,0.7,0.1"                    "35+,0.6,0.5,0.3,0.9" 
tabs <- lapply(tabs, function(x)read.table(text=x, sep=",", header=T, row.names=1))
tabs

[[1]]
       Variable1 Variable2 Variable3 Variable4
Male         0.5       0.6       0.7       0.8
Female       0.5       0.4       0.3       0.2

[[2]]
      Variable1 Variable2 Variable3 Variable4
18-34       0.4       0.5       0.7       0.1
35+         0.6       0.5       0.3       0.9
tabs <- lapply(1:length(tabs), function(x) tabs[[x]] %>% mutate(Question= qn[x], Response=row.names(.)))
do.call(rbind, tabs)

  Variable1 Variable2 Variable3 Variable4                Question Response
1       0.5       0.6       0.7       0.8 Are you male or female?     Male
2       0.5       0.4       0.3       0.2 Are you male or female?   Female
3       0.4       0.5       0.7       0.1        How old are you?    18-34
4       0.6       0.5       0.3       0.9        How old are you?      35+
set.seed(4)
sq_1 <- data.frame(V1 = rnorm(2, .5, .1), V2 = rnorm(2, .5, .1),V3 = rnorm(2, .5, .1),V4 = rnorm(2, .5, .1), row.names=paste0("response",1:2))
sq_2 <- data.frame(V1 = rnorm(2, .5, .1), V2 = rnorm(2, .5, .1),V3 = rnorm(2, .5, .1),V4 = rnorm(2, .5, .1), row.names=paste0("response",1:2))
write.csv(sq_1, "survey_question_1.csv")
write.csv(sq_2, "survey_question_2.csv")
files <- list.files(pattern="\\.csv")
survey <- lapply(files, read.csv, header=T, row.names=1)
library(dplyr)    
survey <- lapply(1:length(survey), function(x) survey[[x]] %>% 
               mutate(Question=paste0("Q",x), Response = rownames(.)))
do.call(rbind, survey)


         V1        V2        V3        V4 Question  Response
1 0.5216755 0.5891145 0.6635618 0.3718753       Q1 response1
2 0.4457507 0.5595981 0.5689275 0.4786855       Q1 response2
3 0.6896540 0.5566604 0.5383057 0.5034352       Q2 response1
4 0.6776863 0.5015719 0.4954863 0.5169027       Q2 response2