在R中加载文件时出现问题_R

在R中加载文件时出现问题

在R中加载文件时出现问题,r,R,我无法成功加载Qld+20-34+Age+Groups.zip文件中包含的数据，该文件位于我已经在txt编辑器中打开了该文件，并删除了不需要的头行和尾行。我尝试了各种read_csv和read.csv组合来导入它，但它总是在数据集的末尾引入一个额外的列，其中填充了NAs。我尝试将其转换为文本文件，并使用read\u delim和read.table，但仍然遇到相同的问题 df <- read_delim("C:/Qld 20-34 Age Groups Clean.txt", col_

我无法成功加载Qld+20-34+Age+Groups.zip文件中包含的数据，该文件位于

我已经在txt编辑器中打开了该文件，并删除了不需要的头行和尾行。我尝试了各种

read_csv

和

read.csv

组合来导入它，但它总是在数据集的末尾引入一个额外的列，其中填充了

NA

s。我尝试将其转换为文本文件，并使用

read\u delim

和

read.table

，但仍然遇到相同的问题

df <- read_delim("C:/Qld 20-34 Age Groups Clean.txt", col_names = FALSE, quote = "\"", na = c("", "NA"), delim = ",")
Parsed with column specification:
cols(
  X1 = col_character(),
  X2 = col_character(),
  X3 = col_integer(),
  X4 = col_integer(),
  X5 = col_integer(),
  X6 = col_integer(),
  X7 = col_character()
)
Warning: 1 parsing failure.
row 
# A tibble: 1 x 5 
col       row   col  expected    actual expected     
<int> <chr>     <chr>     <chr> 
actual 1 1423530  <NA> 7 columns 6 columns file 
# ... with 1 more variables: file <chr>

df <- read_delim("C:/Qld 20-34 Age Groups Clean.txt", delim = ",", col_names = FALSE, quote = "\"", na = c("", "NA"))
Parsed with column specification:
cols(
  X1 = col_character(),
  X2 = col_character(),
  X3 = col_integer(),
  X4 = col_integer(),
  X5 = col_integer(),
  X6 = col_integer(),
  X7 = col_character()
)
|========================================================| 100%   29 MB

df <- read_csv("C:/qldtest.csv", col_names = TRUE)
Parsed with column specification:
cols(
  X1 = col_character(),
  X2 = col_character(),
  X6 = col_integer()
)

然后我跑

transform(df, X1 = na.locf(Suburb))

…在第一列中填写最后一个已知值，使其成为

X1    | X2 | X6
----------|----------------|------
Abbotsbury|4032,QLD        |0
Abbotsbury|4033,QLD        |0
Abbotsbury|4034,QLD        |10
Abbotsbury|4035,QLD        |0
Smith Town|4032,QLD        |0
Smith Town|4033,QLD        |220
Smith Town|4034,QLD        |0
Smith Town|4035,QLD        |0

这工作正常，但有以下警告

+ transform(df, X1 = na.locf(df))
Warning messages:
1: In is.na(object) :
  is.na() applied to non-(list or vector) of type 'NULL'
2: In is.na(object[1L]) :
  is.na() applied to non-(list or vector) of type 'NULL'

也就是说，数据看起来是正确的

但是，当我运行以下命令仅选择X6列大于0的记录时，R明显地添加了另外四列，但全局环境中的变量计数仍然是3

df1 <- df %>%
        filter(X6 > 0)

我做错了什么？谢谢你的帮助

文件的前几行看起来像附加的图像

跳过前9行并使用文件的正常头怎么样

大概是这样的：

jnk <- 
  read.csv('~/Downloads/Qld 20-34 Age Groups.csv', skip=9, stringsAsFactors=FALSE)

例如，使用

df%>%filter（X6>0）

命令，如下所示

head(jnk %>% filter(Total > 0))

还是我遗漏了问题中的一些要点？

如果在Sublime这样的文本编辑器中打开文件，您会看到每行后面都有一个逗号：

这就是为什么会有一个额外的列

我假设您不需要数据上方的信息，因此我建议使用

skip=11

读取数据。由于数据下面有免责声明，您可以使用

n_max

通过限制读取的行数来排除它

library(readr)
file <- "C:/Qld 20-34 Age Groups Clean.txt"
df <- read_delim(file, col_names = FALSE, quote = "\"", na = c("", "NA"), 
                 delim = ",", skip = 11, n_max = 1423540)
df$X7 <- NULL
head(df, n = 5)
# A tibble: 5 x 6
     X1        X2    X3    X4    X5    X6
      <chr>     <chr> <int> <int> <int> <int>
1 Abbeywood 4000, QLD     0     0     0     0
2      <NA> 4005, QLD     0     0     0     0
3      <NA> 4006, QLD     0     0     0     0
4      <NA> 4007, QLD     0     0     0     0
5      <NA> 4008, QLD     0     0     0     0

库（readr）
我认为我们绝对需要看到这个文件的前几行。您能编辑您的问题以清楚地显示这一点吗？@TimBiegeleisen，我添加了一个文件头的屏幕抓图。您的源文件是Excel还是其他电子表格文件？您不能在上面使用read.csv
。原始源文件是Microsoft Excel逗号分隔的.csv文件。我在txt编辑器中打开了它，并将其转换为文本。我已尝试加载.csv和.txt版本。它们都加载了所有记录，但随后我遇到了上述问题。我尝试过read_csv、read_delim、read.csv和read.table的版本，在每个版本中使用不同的变量/标准。这实际上只是Excel端的清理工作。如果你不能做到这一点，那就忘了在R中做吧。read.csv
函数需要相当严格的结构。我可以把数据放进去，就像上面的tibble一样。当我运行df时，问题出现了，因为您正在将整个data.frame（tibble）传递给函数na.locf（）
。如果改用transform（df，X1=na.locf（df$X1））
它会工作。但是，这是一个非常缓慢的操作，因此我建议执行类似的操作：df%mutate（X1=na.locf（df$X1））Doh！你100%地做到了。现在一切都像一场梦。我确信在以前的版本中，我以这种方式使用了na.locf，但忘记了！不管怎么说，这就像一场梦。非常感谢。：）
jnk <- 
  read.csv('~/Downloads/Qld 20-34 Age Groups.csv', skip=9, stringsAsFactors=FALSE)

summary(jnk)

head(jnk %>% filter(Total > 0))

library(readr)
file <- "C:/Qld 20-34 Age Groups Clean.txt"
df <- read_delim(file, col_names = FALSE, quote = "\"", na = c("", "NA"), 
                 delim = ",", skip = 11, n_max = 1423540)
df$X7 <- NULL
head(df, n = 5)
# A tibble: 5 x 6
     X1        X2    X3    X4    X5    X6
      <chr>     <chr> <int> <int> <int> <int>
1 Abbeywood 4000, QLD     0     0     0     0
2      <NA> 4005, QLD     0     0     0     0
3      <NA> 4006, QLD     0     0     0     0
4      <NA> 4007, QLD     0     0     0     0
5      <NA> 4008, QLD     0     0     0     0

df <- df %>% 
    mutate(X1 = na.locf(df$X1))

head(df, n = 5)
# A tibble: 5 x 6
         X1        X2    X3    X4    X5    X6
      <chr>     <chr> <int> <int> <int> <int>
1 Abbeywood 4000, QLD     0     0     0     0
2      <NA> 4005, QLD     0     0     0     0
3      <NA> 4006, QLD     0     0     0     0
4      <NA> 4007, QLD     0     0     0     0
5      <NA> 4008, QLD     0     0     0     0