在循环和/或应用函数中重新转换SparkR数据框中所有列的类型_R_Apache Spark_Casting_Type Conversion_Sparkr

在循环和/或应用函数中重新转换SparkR数据框中所有列的类型

r apache-spark

在循环和/或应用函数中重新转换SparkR数据框中所有列的类型,r,apache-spark,casting,type-conversion,sparkr,R,Apache Spark,Casting,Type Conversion,Sparkr,使用结构的数据帧时，使用： printSchema(dta) root |-- date: timestamp (nullable = true) |-- valA: float (nullable = true) |-- valB: float (nullable = true) |-- ... printSchema(desiredDta) root |-- date: string(nullable = true) |-- valA: string(nullable = tr

使用结构的数据帧时，使用：

printSchema(dta)
root
 |-- date: timestamp (nullable = true)
 |-- valA: float (nullable = true)
 |-- valB: float (nullable = true)
 |-- ...

printSchema(desiredDta)
root
 |-- date: string(nullable = true)
 |-- valA: string(nullable = true)
 |-- valB: string(nullable = true)
 |-- ...

我想将现有列中的所有列转换为字符串，而无需按名称明确引用每列。

期望的方法所需的方法将在所有列上循环：

# Quickly creating new data frame
dtaTmp <- select(dta, "date")

# Looping through each column of old data frame and adding string equivalent
# to a newly created data frame
for (i in seq_along(columns(dtaTmp))) {
    print(i)
    x  <- cast(eval(parse(text = paste(sep = "$", "dtaTmp", columns(dtaTmp)[i]))), 
           "string")
    dtaTmp <- withColumn(dtaTmp, (columns(dtaTmp)[i], x)
}

您在1.4分支中遇到了一个错误，

withColumn

保留了重复的列名。最简单的解决方案是使用一个带有列列表的

select

：

select(df, lapply(columns(df), function(x) alias(cast(df[[x]], "string"), x)))