R 一次转换数据帧中多列的类型
我似乎花了很多时间从文件、数据库或其他东西创建一个数据帧,然后将每一列转换成我想要的类型(数字、因子、字符等)。有没有一种方法可以在一个步骤中实现这一点,可能是通过给出一个类型向量R 一次转换数据帧中多列的类型,r,type-conversion,R,Type Conversion,我似乎花了很多时间从文件、数据库或其他东西创建一个数据帧,然后将每一列转换成我想要的类型(数字、因子、字符等)。有没有一种方法可以在一个步骤中实现这一点,可能是通过给出一个类型向量 foo<-data.frame(x=c(1:10), y=c("red", "red", "red", "blue", "blue", "blue", "yellow", "yellow", "yellow",
foo<-data.frame(x=c(1:10),
y=c("red", "red", "red", "blue", "blue",
"blue", "yellow", "yellow", "yellow",
"green"),
z=Sys.Date()+c(1:10))
foo$x<-as.character(foo$x)
foo$y<-as.character(foo$y)
foo$z<-as.numeric(foo$z)
foo我发现我也经常遇到这个问题。这是关于如何导入数据的。所有read…()函数都有某种类型的选项来指定不将字符串转换为因子。这意味着文本字符串将保留字符,看起来像数字的东西将保留为数字。当元素为空而不是NA时,就会出现问题。但是,na.strings=c(“,…)也应该解决这个问题。首先,我将仔细检查您的导入过程,并相应地进行调整
但是你可以创建一个函数,并将这个字符串推过
convert.magic <- function(x, y=NA) {
for(i in 1:length(y)) {
if (y[i] == "numeric") {
x[i] <- as.numeric(x[[i]])
}
if (y[i] == "character")
x[i] <- as.character(x[[i]])
}
return(x)
}
foo <- convert.magic(foo, c("character", "character", "numeric"))
> str(foo)
'data.frame': 10 obs. of 3 variables:
$ x: chr "1" "2" "3" "4" ...
$ y: chr "red" "red" "red" "blue" ...
$ z: num 15254 15255 15256 15257 15258 ...
convert.magic编辑有关此基本思想的一些简化和扩展,请参见相关问题
我用开关对布兰登答案的评论:
convert.magic <- function(obj,types){
for (i in 1:length(obj)){
FUN <- switch(types[i],character = as.character,
numeric = as.numeric,
factor = as.factor)
obj[,i] <- FUN(obj[,i])
}
obj
}
out <- convert.magic(foo,c('character','character','numeric'))
> str(out)
'data.frame': 10 obs. of 3 variables:
$ x: chr "1" "2" "3" "4" ...
$ y: chr "red" "red" "red" "blue" ...
$ z: num 15254 15255 15256 15257 15258 ...
执行此操作时,请注意R中强制数据的一些复杂性。例如,从因子转换为数值通常涉及as.numeric(as.character(…)
。另外,请注意data.frame()
和as.data.frame()
将字符转换为因子的默认行为。我刚才在RSQLite获取方法中遇到了类似的情况。。。结果以原子数据类型的形式返回。就我而言,是日期时间戳让我感到沮丧。
我发现setAs
函数对于帮助as
按预期工作非常有用。这是我的小例子
##data.frame conversion function
convert.magic2 <- function(df,classes){
out <- lapply(1:length(classes),
FUN = function(classIndex){as(df[,classIndex],classes[classIndex])})
names(out) <- colnames(df)
return(data.frame(out))
}
##small example case
tmp.df <- data.frame('dt'=c("2013-09-02 09:35:06", "2013-09-02 09:38:24", "2013-09-02 09:38:42", "2013-09-02 09:38:42"),
'v'=c('1','2','3','4'),
stringsAsFactors=FALSE)
classes=c('POSIXct','numeric')
str(tmp.df)
#confirm that it has character datatype columns
## 'data.frame': 4 obs. of 2 variables:
## $ dt: chr "2013-09-02 09:35:06" "2013-09-02 09:38:24" "2013-09-02 09:38:42" "2013-09-02 09:38:42"
## $ v : chr "1" "2" "3" "4"
##is the dt column coerceable to POSIXct?
canCoerce(tmp.df$dt,"POSIXct")
## [1] FALSE
##and the conver.magic2 function fails also:
tmp.df.n <- convert.magic2(tmp.df,classes)
## Error in as(df[, classIndex], classes[classIndex]) :
## no method or default for coercing “character” to “POSIXct”
##ittle reading reveals the setAS function
setAs('character', 'POSIXct', function(from){return(as.POSIXct(from))})
##better answer for canCoerce
canCoerce(tmp.df$dt,"POSIXct")
## [1] TRUE
##better answer from conver.magic2
tmp.df.n <- convert.magic2(tmp.df,classes)
##column datatypes converted as I would like them!
str(tmp.df.n)
## 'data.frame': 4 obs. of 2 variables:
## $ dt: POSIXct, format: "2013-09-02 09:35:06" "2013-09-02 09:38:24" "2013-09-02 09:38:42" "2013-09-02 09:38:42"
## $ v : num 1 2 3 4
##data.frame转换函数
convert.magic2我知道我的答案已经很晚了,但是使用循环和attributes函数是解决问题的简单方法
names <- c("x", "y", "z")
chclass <- c("character", "character", "numeric")
for (i in (1:length(names))) {
attributes(foo[, names[i]])$class <- chclass[i]
}
names如果您希望自动检测列数据类型而不是手动指定它(例如,在数据整理等之后),该函数可能会有所帮助
函数type.convert()
接收字符向量并尝试确定所有元素的最佳类型(意味着每列必须应用一次)
除了@joran的答案之外,在该答案中,convert.magic
不会保留因子到数字转换中的数值:
convert.magic <- function(obj,types){
out <- lapply(1:length(obj),FUN = function(i){FUN1 <- switch(types[i],
character = as.character,numeric = as.numeric,factor = as.factor); FUN1(obj[,i])})
names(out) <- colnames(obj)
as.data.frame(out,stringsAsFactors = FALSE)
}
foo<-data.frame(x=c(1:10),
y=c("red", "red", "red", "blue", "blue",
"blue", "yellow", "yellow", "yellow",
"green"),
z=Sys.Date()+c(1:10))
foo$x<-as.character(foo$x)
foo$y<-as.character(foo$y)
foo$z<-as.numeric(foo$z)
str(foo)
# 'data.frame': 10 obs. of 3 variables:
# $ x: chr "1" "2" "3" "4" ...
# $ y: chr "red" "red" "red" "blue" ...
# $ z: num 16777 16778 16779 16780 16781 ...
foo.factors <- convert.magic(foo, rep("factor", 3))
str(foo.factors) # all factors
foo.numeric.not.preserved <- convert.magic(foo.factors, c("numeric", "character", "numeric"))
str(foo.numeric.not.preserved)
# 'data.frame': 10 obs. of 3 variables:
# $ x: num 1 3 4 5 6 7 8 9 10 2
# $ y: chr "red" "red" "red" "blue" ...
# $ z: num 1 2 3 4 5 6 7 8 9 10
# z comes out as 1 2 3...
convert.magic这是一个有点简单的data.table解决方案,但如果要更改为许多不同的列类型,则需要执行一些步骤
dt <- data.table( x=c(1:10), y=c(10:20), z=c(10:20), name=letters[1:10])
dt <- dt[, lapply(.SD, as.numeric), by= name]
dt变换就是您所描述的:
foo <- transform(foo, x=as.character(x), y=as.character(y), z=as.numeric(z))
foo类似于type.convert(foo,as.is=TRUE)
还有readr::type\u convert
将数据帧转换为适当的类,而不指定它们
readr::type_convert(foo)
如果您将所有列保留为字符,我们还可以使用readr::parse_guess
,它将自动将数据帧转换为正确的类。考虑修改后的数据文件
foo <- data.frame(x = as.character(1:10),
y = c("red", "red", "red", "blue", "blue", "blue", "yellow",
"yellow", "yellow", "green"),
z = as.character(Sys.Date()+c(1:10)), stringsAsFactors = FALSE)
str(foo)
#'data.frame': 10 obs. of 3 variables:
# $ x: chr "1" "2" "3" "4" ...
# $ y: chr "red" "red" "red" "blue" ...
# $ z: chr "2019-08-12" "2019-08-13" "2019-08-14" "2019-08-15" ...
使用purrr
和base
:
foo<-data.frame(x=c(1:10),
y=c("red", "red", "red", "blue", "blue",
"blue", "yellow", "yellow", "yellow",
"green"),
z=Sys.Date()+c(1:10))
types <- c("character", "character", "numeric")
types<-paste0("as.",types)
purrr::map2_df(foo,types,function(x,y) do.call(y,list(x)))
# A tibble: 10 x 3
x y z
<chr> <chr> <dbl>
1 1 red 18127
2 2 red 18128
3 3 red 18129
4 4 blue 18130
foo软件包中有一个简单的解决方案hablar
代码
库(hablar)
图书馆(dplyr)
df%
转换(int(x,z),
chr(y))
结果
#一个tible:1 x 3
x y z
1 1 2 4
您可以简单地放置多个列名以将多列(例如,z
和z
转换为整数,如上面的示例所示。使用colClasses
参数来read.table
。也可以简单地使用:for(n in names(foo)[1:2]{foo[[n]]了解到如果将多个字段从因子转换为数值,则需要再次调用as.character
或levels
。请参阅:尝试将if
语句替换为调用switch
,这实际上可以返回相应的函数:switch(expr,character=as.character,numeric=as.numeric,…)
.meh,把它写下来作为一个答案,这样你就可以得到额外的分数:)我只是很快想出了一些东西。+1给子孙后代,尽管我不明白区别是什么。+1推荐lappy
。我过去一直在努力优化这类问题,结果是[此函数是否将数字因子转换为数字(即3.6=3.6,而不是因子顺序号)?如何将其合并到函数中?我尝试使用.numeric(as.character),这不起作用。@MatthewDowle:介意发布data.table解决方案吗?还没有做太多,所以这对我来说不一定是一个简单的问题。不过听起来很有趣。@Mattbanert您好。我会这样做的,就像上次编辑一样,设置一个循环的
。将-
替换为调用as(…)
,或类似内容。您的第一个选择可能是df[],感谢您提供的格式化技巧。我花了很长时间才找到类似type.convert的功能,因此我认为将其放在一个更频繁出现的类似问题上会有助于像我这样的人。这很公平,但或许值得一看
dt <- data.table( x=c(1:10), y=c(10:20), z=c(10:20), name=letters[1:10])
dt <- dt[, lapply(.SD, as.numeric), by= name]
foo <- transform(foo, x=as.character(x), y=as.character(y), z=as.numeric(z))
readr::type_convert(foo)
foo <- data.frame(x = as.character(1:10),
y = c("red", "red", "red", "blue", "blue", "blue", "yellow",
"yellow", "yellow", "green"),
z = as.character(Sys.Date()+c(1:10)), stringsAsFactors = FALSE)
str(foo)
#'data.frame': 10 obs. of 3 variables:
# $ x: chr "1" "2" "3" "4" ...
# $ y: chr "red" "red" "red" "blue" ...
# $ z: chr "2019-08-12" "2019-08-13" "2019-08-14" "2019-08-15" ...
foo[] <- lapply(foo, readr::parse_guess)
#'data.frame': 10 obs. of 3 variables:
# $ x: num 1 2 3 4 5 6 7 8 9 10
# $ y: chr "red" "red" "red" "blue" ...
# $ z: Date, format: "2019-08-12" "2019-08-13" "2019-08-14" "2019-08-15" ...
foo<-data.frame(x=c(1:10),
y=c("red", "red", "red", "blue", "blue",
"blue", "yellow", "yellow", "yellow",
"green"),
z=Sys.Date()+c(1:10))
types <- c("character", "character", "numeric")
types<-paste0("as.",types)
purrr::map2_df(foo,types,function(x,y) do.call(y,list(x)))
# A tibble: 10 x 3
x y z
<chr> <chr> <dbl>
1 1 red 18127
2 2 red 18128
3 3 red 18129
4 4 blue 18130
library(hablar)
library(dplyr)
df <- data.frame(x = "1", y = "2", z = "4")
df %>%
convert(int(x, z),
chr(y))
# A tibble: 1 x 3
x y z
<int> <chr> <int>
1 1 2 4