Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/72.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 如果为null,则替换为0,否则为同一列中的默认值_R_Apache Spark_Sparkr - Fatal编程技术网

R 如果为null,则替换为0,否则为同一列中的默认值

R 如果为null,则替换为0,否则为同一列中的默认值,r,apache-spark,sparkr,R,Apache Spark,Sparkr,在SparkR shell 1.5.0中,创建了一个示例数据集: df_test <- createDataFrame(sqlContext, data.frame(mon = c(1,2,3,4,5), year = c(2011,2012,2013,2014,2015))) df_test1 <- createDataFrame(sqlContext, data.frame(mon1 = c(1,2,3,4,5,6,7,8))) df_test2 <- join(df_te

在SparkR shell 1.5.0中,创建了一个示例数据集:

df_test <- createDataFrame(sqlContext, data.frame(mon = c(1,2,3,4,5), year = c(2011,2012,2013,2014,2015)))
df_test1 <- createDataFrame(sqlContext, data.frame(mon1 = c(1,2,3,4,5,6,7,8)))
df_test2 <- join(df_test1, df_test, joinExpr = df_test1$mon1 == df_test$mon, joinType = "left_outer")
问题:如果列
df_test2$year
中存在
null
我如何将其替换为
0
,或者使用默认值

输出应该是这样的

+----+----+------+
|mon1| mon|  year|
+----+----+------+
| 7.0|null|  0   |
| 1.0| 1.0|2011.0|
| 6.0|null|  0   |
| 3.0| 3.0|2013.0|
| 5.0| 5.0|2015.0|
| 8.0|null|  0   |
| 4.0| 4.0|2014.0|
| 2.0| 2.0|2012.0|
+----+----+------+
我使用了
否则/当
,但不起作用

df_test2$year <- otherwise(when(isNull(df_test2$year), 0 ), df_test2$year)

我使用了原始SQL
case when
表达式来得到答案

df_test3 <- sql(sqlContext, "select mon1, mon, case when year is null then 0 else year end year FROM temp")

showDF(df_test3)
+----+----+------+
|mon1| mon|  year|
+----+----+------+
| 7.0|null|   0.0|
| 1.0| 1.0|2011.0|
| 6.0|null|   0.0|
| 3.0| 3.0|2013.0|
| 5.0| 5.0|2015.0|
| 8.0|null|   0.0|
| 4.0| 4.0|2014.0|
| 2.0| 2.0|2012.0|
+----+----+------+

df_test3
df_test2$year您的“null”值的具体类型是什么?(我不知道SparkR,抱歉)试试这个
df_test2$year[is.null(df_test2$year)]你在找SparkR命令吗?用@Jimbou的解决方案或
setDT(df_test2)[is.null(year),year:=0]
,这个问题在R中很容易解决,但这些在spark env中对您有用吗?我正在寻找SparkR命令..可能是
Error in rep(yes, length.out = length(ans)) :
  attempt to replicate an object of type 'environment'
df_test3 <- sql(sqlContext, "select mon1, mon, case when year is null then 0 else year end year FROM temp")

showDF(df_test3)
+----+----+------+
|mon1| mon|  year|
+----+----+------+
| 7.0|null|   0.0|
| 1.0| 1.0|2011.0|
| 6.0|null|   0.0|
| 3.0| 3.0|2013.0|
| 5.0| 5.0|2015.0|
| 8.0|null|   0.0|
| 4.0| 4.0|2014.0|
| 2.0| 2.0|2012.0|
+----+----+------+
df_test2$year <- ifelse(isNull(df_test2$year), 0, df_test2$year)