R 阅读excel won'；t修剪空白_R_Tidyverse_Readxl

R 阅读excel won'；t修剪空白

R 阅读excel won'；t修剪空白,r,tidyverse,readxl,R,Tidyverse,Readxl,我正在使用readxl包加载excel文件。默认情况下，它应该去掉空白，但不这样做该文件可以直接从下面的链接下载，也可以通过附录B中的网站下载 require（readxl）；要求（整洁的人）测试%str\u计数（模式=“\\s”） test$`MVAr Generation`%>%表#都是数字 test$`MVAr Generation`%%>%class#但是该类是character test$`MVAr Generation`%%>%str\u计数（patter=“\\s”）%%

我正在使用readxl包加载excel文件。默认情况下，它应该去掉空白，但不这样做

该文件可以直接从下面的链接下载，也可以通过附录B中的网站下载

require（readxl）；要求（整洁的人）
测试%str\u计数（模式=“\\s”）
test$`MVAr Generation`%>%表#都是数字
test$`MVAr Generation`%%>%class#但是该类是character
test$`MVAr Generation`%%>%str\u计数（patter=“\\s”）%%
总和（na.rm=T）#它应该是0，但它是2

此问题会导致分析中出现问题，如本例中数字列为字符所示。我们将不胜感激

library(readxl)

readxl::excel_sheets('ETYS 2016 Appendix B.xlsx')[22]
test <- read_excel("ETYS 2016 Appendix B.xlsx", skip = 1, sheet = 22, 
                   trim_ws = FALSE)
test$`MVAr Generation` <- as.numeric(gsub('^\\s', "", test$`MVAr Generation`))

您可以通过使用

gsub

替换前导空格手动避免这种情况，也许这就是您想要的：

library(xlsx)
test <- read.xlsx("ETYS 2016 Appendix B.xlsx", sheetName = 22, 
              colIndex = 1:7, startRow = 2, header = TRUE, 
              stringsAsFactors = FALSE)

# remove whitespace
test <- data.frame(lapply(test, function(y) {
           y <- gsub("^\\s+", "", y); 
           y <- gsub("Â", "", y); y
           y <- gsub("^\\s+", "", y); 
           }))

# set tidy cols to numeric
cols = c(3, 4, 5, 7)
test[,cols] = apply(test[,cols], 2, function(x) as.numeric(x))

# test
class(test$Unit.Number)
test$MVAr.Absorption

库（xlsx）
test@troh对字符编码的洞察让我开始考虑使用regex@jaySF在整个数据帧中的应用程序是同时处理所有列的好方法。这两个建议让我得到了下面的答案
require(dplyr);require(purrr);require(readr)
RemoveSymbols <-function(df)  {
  df  %>% mutate_all( funs(gsub("^[^A-Z0-9]", "", ., ignore.case = FALSE))) %>%
     map_df(parse_guess) 
}

test2 <- RemoveSymbols(test)

sapply(test2,class)

require（dplyr）；需要（purrr）；需要（readr）
移除符号%mutate_all（funs（gsub（“^A-Z0-9]”，“，”，ignore.case=FALSE））%>%
map_df（解析_猜测）
}
test2对于无法识别的符号是个好主意，但是，这也会删除带有空格的数字，因此不起作用。
library(xlsx)
test <- read.xlsx("ETYS 2016 Appendix B.xlsx", sheetName = 22, 
              colIndex = 1:7, startRow = 2, header = TRUE, 
              stringsAsFactors = FALSE)

# remove whitespace
test <- data.frame(lapply(test, function(y) {
           y <- gsub("^\\s+", "", y); 
           y <- gsub("Â", "", y); y
           y <- gsub("^\\s+", "", y); 
           }))

# set tidy cols to numeric
cols = c(3, 4, 5, 7)
test[,cols] = apply(test[,cols], 2, function(x) as.numeric(x))

# test
class(test$Unit.Number)
test$MVAr.Absorption

require(dplyr);require(purrr);require(readr)
RemoveSymbols <-function(df)  {
  df  %>% mutate_all( funs(gsub("^[^A-Z0-9]", "", ., ignore.case = FALSE))) %>%
     map_df(parse_guess) 
}

test2 <- RemoveSymbols(test)

sapply(test2,class)