Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/357.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 统计spark数据帧中的真、假条件数_Python_Apache Spark_Pyspark_Apache Spark Sql - Fatal编程技术网

Python 统计spark数据帧中的真、假条件数

Python 统计spark数据帧中的真、假条件数,python,apache-spark,pyspark,apache-spark-sql,Python,Apache Spark,Pyspark,Apache Spark Sql,我来自MATLAB背景,我可以简单地做到这一点 age_sum_error = sum(age > prediction - 4 & age < prediction + 4); 我希望我的结果看起来像这样 +------+----------+ |false | positive | +------+----------+ |2 | 2 | +------+----------+ 它的代码比matlab多得多,但我会这样做 import numpy

我来自MATLAB背景,我可以简单地做到这一点

age_sum_error = sum(age > prediction - 4 & age < prediction + 4);
我希望我的结果看起来像这样

+------+----------+
|false | positive |
+------+----------+
|2     | 2        |
+------+----------+

它的代码比matlab多得多,但我会这样做

import numpy as np

ages = [35, 40, 45, 26]
pred = [30, 42, 38, 29]
tolerance = 4

# get boolean array of people older and younger than limits
is_older = np.greater(ages, pred-tolerance) # a boolean array
is_younger =  np.less(ages, pred+tolerance) # a boolean array

# convert these boolean arrays to ints then multiply. True = 1, False = 0. 
in_range = is_older.astype(int)*is_younger.astype(int) # 0's cancel 1's

# add upp the indixes that are still 1
senior_count = np.sum(in_range)

希望这能有所帮助。

首先计算条件,然后将1和0相加得出结果:

import numpy as np

ages = [35, 40, 45, 26]
pred = [30, 42, 38, 29]
tolerance = 4

# get boolean array of people older and younger than limits
is_older = np.greater(ages, pred-tolerance) # a boolean array
is_younger =  np.less(ages, pred+tolerance) # a boolean array

# convert these boolean arrays to ints then multiply. True = 1, False = 0. 
in_range = is_older.astype(int)*is_younger.astype(int) # 0's cancel 1's

# add upp the indixes that are still 1
senior_count = np.sum(in_range)
df.selectExpr(
    'cast(abs(age - prediction) < 4 as int) as condition'
).selectExpr(
    'sum(condition) as positive', 
    'sum(1-condition) as negative'
).show()
+--------+--------+
|positive|negative|
+--------+--------+
|       2|       2|
+--------+--------+