Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python dailyHosts=(dayHostCount.sortByKey())错误_Python_Apache Spark_Pyspark - Fatal编程技术网

Python dailyHosts=(dayHostCount.sortByKey())错误

Python dailyHosts=(dayHostCount.sortByKey())错误,python,apache-spark,pyspark,Python,Apache Spark,Pyspark,我使用pyspark、Python3.0.0运行这段代码,代码行dailyHosts=(dayHostCount.sortByKey()) 我收到一个错误“Py4JJavaError Traceback(最近一次调用last)”在 Py4JJavaError:调用z:org.apache.spark.api.python.PythonRDD.collectAndServe时出错。 :org.apache.spark.sparkeexception:作业因阶段失败而中止:阶段52.0中的任务0失败

我使用pyspark、Python3.0.0运行这段代码,代码行dailyHosts=(dayHostCount.sortByKey())

我收到一个错误“Py4JJavaError Traceback(最近一次调用last)”在

Py4JJavaError:调用z:org.apache.spark.api.python.PythonRDD.collectAndServe时出错。 :org.apache.spark.sparkeexception:作业因阶段失败而中止:阶段52.0中的任务0失败1次,最近的失败:阶段52.0中的任务0.0丢失(TID 213,LAPTOP-I236OH25,执行器驱动程序):org.apache.spark.api.python.python异常:回溯(最近一次调用): 文件“C:\spark\spark-3.0.0-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py”,第605行,在main中 文件“C:\spark\spark-3.0.0-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py”,第595行,正在处理中 文件“C:\spark\spark-3.0.0-bin-hadoop2.7\python\pyspark\rdd.py”,第2596行,在管道功能中 返回函数(拆分,上一个函数(拆分,迭代器))

dayToHostPairTuple = access_logs.map(lambda log: (log.date_time.day, log.host))

dayGroupedHosts = dayToHostPairTuple.groupByKey()

dayHostCount = dayGroupedHosts.map(lambda xs: (xs[0], len(Set(xs[1]))))

dailyHosts = (dayHostCount.sortByKey())
dailyHostsList = dailyHosts.cache().take(30)
print ('Unique hosts per day: %s' % dailyHostsList)