Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/82.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
SparkR:如何提取特定列中包含空值的行_R_Dataframe_Sparkr - Fatal编程技术网

SparkR:如何提取特定列中包含空值的行

SparkR:如何提取特定列中包含空值的行,r,dataframe,sparkr,R,Dataframe,Sparkr,免责声明:我对SparkR几乎没有经验 采用以下数据帧: ID Date1 Date2 58844880 04/11/16 NaN 59745846 04/12/16 04/14/16 59743311 04/13/16 NaN 59745848 04/14/16 04/11/16 59598413 NaN NaN 59745921 04/14/16 04/14/16 59561199

免责声明:我对SparkR几乎没有经验

采用以下数据帧:

ID          Date1       Date2
58844880    04/11/16    NaN
59745846    04/12/16    04/14/16
59743311    04/13/16    NaN
59745848    04/14/16    04/11/16
59598413    NaN         NaN
59745921    04/14/16    04/14/16
59561199    04/15/16    04/15/16
NaN         04/16/16    04/16/16
59561198    NaN         04/17/16
我只想获取Date 2列中包含
NaN
的行


在R中,我将使用
new_DF可以使用
filter
和条件
isNull
类似:

DF2 <- SparkR::filter(DF,  isNull(DF$Date2))

DF2这里有一个选项使用
sparklyr

library(sparklyr)
library(dplyr)

con <- spark_connect(master = "local")
DF1 = copy_to(con, DF)


DF1 %>%
   mutate_at(vars(matches("Date")), 
          funs(to_date(from_unixtime(unix_timestamp(., "MM/dd/yy"))))) %>%
   filter(is.na(Date2)) %>%
   collect()
# A tibble: 3 x 3
#        ID Date1      Date2     
#     <dbl> <date>     <date>    
#1 58844880 2016-04-10 NA        
#2 59743311 2016-04-12 NA        
#3 59598413 NA         NA      

spark_disconnect(con)
库(年)
图书馆(dplyr)
con%
在(变量(匹配(“日期”))发生变异,
funs(到日期(从unixtime(unix)时间戳(,“MM/dd/yy”!))%>%
筛选器(is.na(Date2))%>%
收集
#一个tibble:3x3
#ID Date1 Date2
#               
#158844880 2016-04-10北美
#259743311 2016-04-12北美
#359598413 NA NA
火花塞断开(con)
数据
DF
DF <- structure(list(ID = c(58844880, 59745846, 59743311, 59745848, 
 59598413, 59745921, 59561199, NaN, 59561198), Date1 = c("04/11/16", 
 "04/12/16", "04/13/16", "04/14/16", "NaN", "04/14/16", "04/15/16", 
 "04/16/16", "NaN"), Date2 = c("NaN", "04/14/16", "NaN", "04/11/16", 
 "NaN", "04/14/16", "04/15/16", "04/16/16", "04/17/16")), .Names = c("ID", 
 "Date1", "Date2"), class = "data.frame", row.names = c(NA, -9L
 ))