Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 比较pyspark中具有特定误差范围的值_Apache Spark_Pyspark_Apache Spark Sql_Compare_Pyspark Dataframes - Fatal编程技术网

Apache spark 比较pyspark中具有特定误差范围的值

Apache spark 比较pyspark中具有特定误差范围的值,apache-spark,pyspark,apache-spark-sql,compare,pyspark-dataframes,Apache Spark,Pyspark,Apache Spark Sql,Compare,Pyspark Dataframes,有没有办法比较pyspark中两个double类型的值,并且有指定的误差范围? 基本类似于此,但在pyspark中 比如: df=#some dataframe with 2 columns RESULT1 and RESULT2 df=withColumn('compare', when(col('RESULT1')==col('RESULT2') +/- 0.05*col('RESULT2'), lit("match")).otherwise(lit("no

有没有办法比较pyspark中两个
double
类型的值,并且有指定的误差范围? 基本类似于此,但在pyspark中

比如:

df=#some dataframe with 2 columns RESULT1 and RESULT2

df=withColumn('compare', when(col('RESULT1')==col('RESULT2') +/- 0.05*col('RESULT2'), lit("match")).otherwise(lit("no match"))

但是以一种更优雅的方式?

您可以使用
between
作为条件:

df2 = df.withColumn(
    'compare',
    when(
        col('RESULT1').between(0.95*col('RESULT2'), 1.05*col('RESULT2')), 
        lit("match")
    ).otherwise(
        lit("no match")
    )
)

您可以使用之间的作为条件:

df2 = df.withColumn(
    'compare',
    when(
        col('RESULT1').between(0.95*col('RESULT2'), 1.05*col('RESULT2')), 
        lit("match")
    ).otherwise(
        lit("no match")
    )
)

您还可以编写为
| RESULT1-RESULT2 |您还可以编写为
| RESULT1-RESULT2|