Python dailyHosts=(dayHostCount.sortByKey())错误
我使用pyspark、Python3.0.0运行这段代码,代码行dailyHosts=(dayHostCount.sortByKey()) 我收到一个错误“Py4JJavaError Traceback(最近一次调用last)”在 Py4JJavaError:调用z:org.apache.spark.api.python.PythonRDD.collectAndServe时出错。 :org.apache.spark.sparkeexception:作业因阶段失败而中止:阶段52.0中的任务0失败1次,最近的失败:阶段52.0中的任务0.0丢失(TID 213,LAPTOP-I236OH25,执行器驱动程序):org.apache.spark.api.python.python异常:回溯(最近一次调用): 文件“C:\spark\spark-3.0.0-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py”,第605行,在main中 文件“C:\spark\spark-3.0.0-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py”,第595行,正在处理中 文件“C:\spark\spark-3.0.0-bin-hadoop2.7\python\pyspark\rdd.py”,第2596行,在管道功能中 返回函数(拆分,上一个函数(拆分,迭代器))Python dailyHosts=(dayHostCount.sortByKey())错误,python,apache-spark,pyspark,Python,Apache Spark,Pyspark,我使用pyspark、Python3.0.0运行这段代码,代码行dailyHosts=(dayHostCount.sortByKey()) 我收到一个错误“Py4JJavaError Traceback(最近一次调用last)”在 Py4JJavaError:调用z:org.apache.spark.api.python.PythonRDD.collectAndServe时出错。 :org.apache.spark.sparkeexception:作业因阶段失败而中止:阶段52.0中的任务0失败
dayToHostPairTuple = access_logs.map(lambda log: (log.date_time.day, log.host))
dayGroupedHosts = dayToHostPairTuple.groupByKey()
dayHostCount = dayGroupedHosts.map(lambda xs: (xs[0], len(Set(xs[1]))))
dailyHosts = (dayHostCount.sortByKey())
dailyHostsList = dailyHosts.cache().take(30)
print ('Unique hosts per day: %s' % dailyHostsList)