Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 筛选pyspark数据帧时出现问题,如果包含&燃气轮机&引用;或<&引用;_Apache Spark_Pyspark_Apache Spark Sql_Pyspark Dataframes - Fatal编程技术网

Apache spark 筛选pyspark数据帧时出现问题,如果包含&燃气轮机&引用;或<&引用;

Apache spark 筛选pyspark数据帧时出现问题,如果包含&燃气轮机&引用;或<&引用;,apache-spark,pyspark,apache-spark-sql,pyspark-dataframes,Apache Spark,Pyspark,Apache Spark Sql,Pyspark Dataframes,我的数据框有value列,其中包含或这肯定是由value列中的空值造成的 df.count()。但是当您在筛选器中使用contains时,将跳过空值 示例: data = [("value1_>", ), ("value2_>", ), ("value3_<",), ("value4",), (None,)] df = spark.createDataFrame(data, ['value']) df1 = df.filter((col("value").contains(

我的数据框有
value
列,其中包含
这肯定是由
value
列中的空值造成的

df.count()。但是当您在筛选器中使用
contains
时,将跳过空值

示例

data = [("value1_>", ), ("value2_>", ), ("value3_<",), ("value4",), (None,)]
df = spark.createDataFrame(data, ['value']) 

df1 = df.filter((col("value").contains('>') | col("value").contains('<')))
df2 = df.filter(~(col("value").contains('>') | col("value").contains('<')))
print(df.count())
print(df1.count())
print(df2.count())

#5
#3
#1
数据=[(“值1>,),(“值2>,),(“值3_
3900000
202
3600000
df.count() = df1.count() + df2.count()
data = [("value1_>", ), ("value2_>", ), ("value3_<",), ("value4",), (None,)]
df = spark.createDataFrame(data, ['value']) 

df1 = df.filter((col("value").contains('>') | col("value").contains('<')))
df2 = df.filter(~(col("value").contains('>') | col("value").contains('<')))
print(df.count())
print(df1.count())
print(df2.count())

#5
#3
#1