在R中导入csv文件/从整数转换为双精度时出现问题_R

在R中导入csv文件/从整数转换为双精度时出现问题

在R中导入csv文件/从整数转换为双精度时出现问题,r,R,今天我终于决定开始攀登R陡峭的学习曲线。我花了几个小时，设法导入了我的数据集并做了一些其他基本的事情，但我在数据类型方面遇到了问题：包含小数的列被导入为整数，而转换为双精度会更改值在尝试将一个小csv文件放在这里作为示例时，我发现只有当数据文件太大时才会出现问题（我的原始文件是1048418 x 12矩阵，但即使“仅”5000行，我也会遇到同样的问题。当我只有100、1000甚至2000行时，列被正确导入为双行）是较小的数据集（仍然为500kb，但是，如果数据集较小，则不会复制问题）。代码是

今天我终于决定开始攀登R陡峭的学习曲线。我花了几个小时，设法导入了我的数据集并做了一些其他基本的事情，但我在数据类型方面遇到了问题：包含小数的列被导入为整数，而转换为双精度会更改值
在尝试将一个小csv文件放在这里作为示例时，我发现只有当数据文件太大时才会出现问题（我的原始文件是1048418 x 12矩阵，但即使“仅”5000行，我也会遇到同样的问题。当我只有100、1000甚至2000行时，列被正确导入为双行）
是较小的数据集（仍然为500kb，但是，如果数据集较小，则不会复制问题）。代码是

> ex <- read.csv("exampleshort.csv",header=TRUE) > typeof(ex$RET) [1] "integer"

它们的顺序应该是：整数、日期、字符串、整数、双精度、双精度、整数、双精度、双精度、双精度（类型可能错误，但希望您能理解我的意思）
请参阅read.csv的帮助：
？read.csv
。以下是相关章节：

colClasses: character. A vector of classes to be assumed for the columns. Recycled as necessary, or if the character vector is named, unspecified values are taken to be ‘NA’. Possible values are ‘NA’ (the default, when ‘type.convert’ is used), ‘"NULL"’ (when the column is skipped), one of the atomic vector classes (logical, integer, numeric, complex, character, raw), or ‘"factor"’, ‘"Date"’ or ‘"POSIXct"’. Otherwise there needs to be an ‘as’ method (from package ‘methods’) for conversion from ‘"character"’ to the specified formal class. Note that ‘colClasses’ is specified per column (not per variable) and so includes the column of row names (if any).
祝你在学习R的过程中好运。这很难，但在你通过前几个阶段（我承认这确实需要一些时间）后会很有趣
尝试此方法并相应地修复其他问题：

ex <- read.csv("exampleshort.csv",header=TRUE,colClasses=c("integer","integer","factor","integer","numeric","factor","factor","integer","numeric","numeric","numeric","numeric"), na.strings=c("."))
出于参考目的（不应将其用作解决方案，因为最好的解决方案是一步正确导入数据）：
RET
未作为整数导入。它作为
因子导入。为了便于将来参考，如果要将系数转换为数值，请使用 new_RET@徐旺：上半场不行。将其缩减到前5000次观察，不到我数据的1%，已经造成了问题…很抱歉，我没有完成我的评论，因为我去阅读了read.csv 帮助。我想说的是，我认为可能有一些奇怪的值混淆了R 。所以我认为这不是大或小的事实，而是大数据集有一个令人困惑的字符或值。这有意义吗？如果不是，那没关系。“我认为解决办法是使用colClasses论点。@徐旺我明白你的意思，但我仍然不太确定如何解决我的问题。如何使用colClasses参数？你能给我一行命令，使用colClasses参数正确导入这个文件吗？当然我们能解决这个问题！请看我在回答中的评论。我需要您提供一些其他信息。我已经阅读了帮助的这一部分，但我真的不明白这一切意味着什么（我今天才开始使用R）。该列只有0或双精度的值，并且没有缺少的值。啊，好的。数据集中的其他列应该是什么？他们可以进口吗？你能把sapply（ex，class）的输出贴出来吗？我把你要求的信息添加到我问题的末尾，太好了，我很高兴！这在R中实际上是一个很好的教训（虽然很痛苦）。此外，我知道帮助文件可能会让人困惑，但它们确实很好。试着通读它们，当你陷入困境时，可以自由地问一些问题，比如R中的“一个因素意味着什么？”？此外，还有很棒的书和免费的介绍。我建议你通过一个。祝你好运我认为typeof 让你感到困惑<代码>类（ex$RET）可能会让你更快找到答案。。。我认为您甚至不需要colClasses ，只需要na.strings参数<代码>ex colClasses: character. A vector of classes to be assumed for the columns. Recycled as necessary, or if the character vector is named, unspecified values are taken to be ‘NA’. Possible values are ‘NA’ (the default, when ‘type.convert’ is used), ‘"NULL"’ (when the column is skipped), one of the atomic vector classes (logical, integer, numeric, complex, character, raw), or ‘"factor"’, ‘"Date"’ or ‘"POSIXct"’. Otherwise there needs to be an ‘as’ method (from package ‘methods’) for conversion from ‘"character"’ to the specified formal class. Note that ‘colClasses’ is specified per column (not per variable) and so includes the column of row names (if any). ex <- read.csv("exampleshort.csv",header=TRUE,colClasses=c("integer","integer","factor","integer","numeric","factor","factor","integer","numeric","numeric","numeric","numeric"), na.strings=c(".")) na.strings: a character vector of strings which are to be interpreted as ‘NA’ values. Blank fields are also considered to be missing values in logical, integer, numeric and complex fields.