sparkR:如何从字符向量创建虚拟列?
考虑以下简单示例:sparkR:如何从字符向量创建虚拟列?,r,apache-spark,sparkr,grepl,R,Apache Spark,Sparkr,Grepl,考虑以下简单示例: df <- data.frame(id=c(1:4), climate=c("cold_rainy","coldSunny","rainywarm","sunny_warm")) head(df) id climate 1 cold_rainy 2 coldSunny 3 rainywarm 4 sunny_warm 如何在sparkR中的SparkDataF
df <- data.frame(id=c(1:4), climate=c("cold_rainy","coldSunny","rainywarm","sunny_warm"))
head(df)
id climate
1 cold_rainy
2 coldSunny
3 rainywarm
4 sunny_warm
如何在sparkR中的SparkDataFrame上实现此操作?您可以首先将字符串值转换为小写,然后使用
rlike()
在$climate
中查找“sunny”
。因此,我们将布尔输出cast()
转换为类型integer
ddf <- createDataFrame(sqlContext, df) # Data
ddf$climate <- lower(ddf$climate) # Convert to lowercase
ddf$sunny <- cast(rlike(ddf$climate, "sunny"), "integer") # Create integer column
> ddf
id climate sunny
1 1 cold_rainy 0
2 2 coldsunny 1
3 3 rainywarm 0
4 4 sunny_warm 1
ddf对我来说是完美的解决方案,非常好!
ddf <- createDataFrame(sqlContext, df) # Data
ddf$climate <- lower(ddf$climate) # Convert to lowercase
ddf$sunny <- cast(rlike(ddf$climate, "sunny"), "integer") # Create integer column
> ddf
id climate sunny
1 1 cold_rainy 0
2 2 coldsunny 1
3 3 rainywarm 0
4 4 sunny_warm 1