PySpark错误:Py4JJavaError:调用o469.count时出错

PySpark错误:Py4JJavaError:调用o469.count时出错,pyspark,apache-spark-sql,spark-dataframe,Pyspark,Apache Spark Sql,Spark Dataframe,我有一个python spark程序,它在某些地方的行为不一致,有时甚至出现错误。 我通常在两个c3.2xlarge和m1.large主机组成的小型EMR集群上运行,运行良好,竞争成功。 然而,当我在一个更大的集群上运行完全相同的程序时——我用一个m1.largemaster尝试了4c3.2xlarge,它以一个错误结束。我将在下面粘贴错误,但即使这些错误也不完全一致,不是错误跟踪本身,也不是它发生的阶段。 例如,在一种情况下,它发生在大约26分钟后,在调用.count()时进入,而在另一种情况

我有一个python spark程序,它在某些地方的行为不一致,有时甚至出现错误。
我通常在两个
c3.2xlarge
m1.large
主机组成的小型EMR集群上运行,运行良好,竞争成功。
然而,当我在一个更大的集群上运行完全相同的程序时——我用一个
m1.large
master尝试了4
c3.2xlarge
,它以一个错误结束。我将在下面粘贴错误,但即使这些错误也不完全一致,不是错误跟踪本身,也不是它发生的阶段。
例如,在一种情况下,它发生在大约26分钟后,在调用
.count()
时进入,而在另一种情况下,它实际上成功地传递了
.count()
,但在调用
.write.jdbc()
时,在大约一小时后的不同阶段发生了错误
所以我假设这是一种竞赛条件,但我甚至不确定这是我不正确使用spark的错,还是spark中的一个bug。
我在本例中使用的大多数功能都来自spark.sql

环境:EMR上的Spark 1.5.2(AWS上的弹性Mapreduce)

堆栈跟踪非常长,因此我无法将整个跟踪粘贴到此处,但希望刚好能够获得上下文。
代码本身-嗯,有很多,我没有找到一个简单的重新编程测试用例,我可以很容易地发布在这里。。。(比赛条件,你知道…)

如前所述,这只是堆栈跟踪的一部分,它变得相当长(示例一):
请注意,这两种情况下的错误都发生在代码的不同位置

如何解决此问题,有什么帮助或指导吗?

干杯

Traceback (most recent call last):
  File "/home/hadoop/rantav.spark_normalize_data.py.134231/spark_normalize_data.py", line 102, in <module>
    run_spark(args)
  File "/home/hadoop/rantav.spark_normalize_data.py.134231/spark_normalize_data.py", line 62, in run_spark
    company_aliases_broadcast, experiences, args)
  File "/home/hadoop/rantav.spark_normalize_data.py.134231/companies.py", line 50, in get_companies
    out(sc, args, 'companies', sql_create_table, companies)
  File "/home/hadoop/rantav.spark_normalize_data.py.134231/output.py", line 48, in out
    mode='append')
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 455, in jdbc
  File "/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 36, in deco
  File "/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o464.jdbc.
: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
TungstenExchange hashpartitioning(ref_company_size_id#96)
 TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,churn_rate#200,churn_rate_percentile#202,retention_rate_2y#210]
  SortMergeOuterJoin [id#85], [company_id#180], LeftOuter, None
   TungstenSort [id#85 ASC], false, 0
    TungstenExchange hashpartitioning(id#85)
     TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,churn_rate#200,(_we0#203 * 100.0) AS churn_rate_percentile#202]
      Window [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,churn_rate#200], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentRank(churn_rate#200) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _we0#203], [ref_company_size_id#96], [churn_rate#200 ASC]
       TungstenSort [ref_company_size_id#96 ASC,churn_rate#200 ASC], false, 0
        TungstenExchange hashpartitioning(ref_company_size_id#96)
         TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,(100.0 * cast(pythonUDF#201 as double)) AS churn_rate#200]
          !BatchPythonEvaluation PythonUDF#divide(count#199L,emp_count#94L), [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,count#199L,pythonUDF#201]
           ConvertToSafe
            TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,count#199L]
             SortMergeOuterJoin [id#85], [company_id#180], LeftOuter, None
              TungstenSort [id#85 ASC], false, 0
               TungstenExchange hashpartitioning(id#85)
                TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,(_we0#198 * 100.0) AS avg_tenure_percentile#197]
                 Window [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentRank(avg_tenure#196) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _we0#198], [ref_company_size_id#96], [avg_tenure#196 ASC]
                  TungstenSort [ref_company_size_id#96 ASC,avg_tenure#196 ASC], false, 0
                   TungstenExchange hashpartitioning(ref_company_size_id#96)
                    TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg(duration_days)#195 AS avg_tenure#196]
                     SortMergeOuterJoin [id#85], [company_id#180], LeftOuter, None
                      TungstenSort [id#85 ASC], false, 0
                       TungstenExchange hashpartitioning(id#85)
                        TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,(_we0#194 * 100.0) AS growth_rate_percentile#193]
                         Window [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentRank(growth_rate#191) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _we0#194], [ref_company_size_id#96], [growth_rate#191 ASC]
                          TungstenSort [ref_company_size_id#96 ASC,growth_rate#191 ASC], false, 0
                           TungstenExchange hashpartitioning(ref_company_size_id#96)
                            TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,(100.0 * cast(pythonUDF#192 as double)) AS growth_rate#191]
                             !BatchPythonEvaluation PythonUDF#divide(count#190L,emp_count#94L), [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,count#190L,pythonUDF#192]
                              ConvertToSafe
                               TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,count#190L]
                                SortMergeOuterJoin [id#85], [company_id#180], LeftOuter, None
                                 TungstenSort [id#85 ASC], false, 0
                                  TungstenExchange hashpartitioning(id#85)
                                   TungstenProject [id#85,href#86,name#87,emp_count#94L,(_we0#176 * 100.0) AS emp_count_percentile#175,ref_company_size_id#96]
                                    Window [id#85,href#86,name#87,emp_count#94L,ref_company_size_id#96], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentRank(emp_count#94L) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _we0#176], [ref_company_size_id#96], [emp_count#94L ASC]
                                     TungstenSort [ref_company_size_id#96 ASC,emp_count#94L ASC], false, 0
回溯(最近一次呼叫最后一次):
文件“/home/hadoop/rantav.spark\u normalize\u data.py.134231/spark\u normalize\u data.py”,第102行,在
运行火花(args)
文件“/home/hadoop/rantav.spark\u normalize\u data.py.134231/spark\u normalize\u data.py”,第62行,在run\u spark中
公司名称(广播、体验、参数)
文件“/home/hadoop/rantav.spark\u normalize\u data.py.134231/companys.py”,第50行,在get\u companys中
输出(sc、参数、“公司”、sql\u创建表、公司)
文件“/home/hadoop/rantav.spark\u normalize\u data.py.134231/output.py”,第48行,输入输出
模式(附加)
jdbc中的文件“/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py”,第455行
文件“/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py”,第538行,在调用中__
文件“/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py”,第36行,deco格式
文件“/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py”,第300行,在get_return_值中
py4j.protocol.Py4JJavaError:调用o464.jdbc时出错。
:org.apache.spark.sql.catalyst.errors.package$TreeNodeException:execute,tree:
TungsteneExchange哈希分区(参考公司大小id 96)
TungstenProject[id#85,href#86,name#87,emp#u count#94L,emp#u count#175,ref#u company#u size#id#96,growth#u rate#191,growth#u percentile 193,avg#u任期#196,avg#任期#percentile 197,流失率#200#保留率#210#
SortMergeOuterJoin[id#85],[company#id#180],LeftOuter,无
TungstenSort[id#85 ASC],错误,0
TungsteneExchange哈希分区(id#85)
Tung斯滕项目[ID355;85号,85号,8号,86号,姓名,87号,emp355号,94L,emp355号,85号,85号,85号,85号,85号,85号,85号,85号,85号,85号,85号,85号,85号,85号,85号,85号,87号,emp359号,emp355号计数941号,94L号,94L号,emp355号计数百分位数175号,175号,175号175号,175号,175号,175号,175号,175号,175号,175号,175号,175号,85号,175号,85号,85号,85号,85号,85号,85号,85号,85号,85号,85号,85号,85号,85号,85号,85号,85号,85号,85号,85号,85号,85号,]
窗口(id)85号,85号窗口(id)85号,7号窗口(id)85号,7号窗口(id)85号,7号窗口(id)85号,7号窗口(id)85号窗口,8号窗口(id)85号,7号窗口(id)85号窗口,8号窗口(id)85号窗口,8号窗口(id)85号窗口(id)85号窗口(id 85号85号窗口(id)85号窗口(id)85号窗口计数百分位175号百分位数175,175号窗口(计数)175号175号175号窗口(175号)175号,参考公司(公司大小大小大小企业(尺寸)85号窗口(id)85号企业(id)85号)85号,85号,85号公司(id)85号,85号公司(公司(id)公司(公司大小大小大小)96号企业(id)96号企业(id)96号,公司(id)96号,大小大小大小大小大小(id)96号,8号,8号#U率#200)WindowSpecDefinition行位于无界前后行之间,如_we0#203、[ref_company#size#id#96]、[chorn#rate#200 ASC]
TungstenSort[ref#u company#u size#u id#96 ASC,客户流失率#200 ASC],false,0
TungsteneExchange哈希分区(参考公司大小id 96)
TungstenProject[id#85,href#86,name#87,emp#u count#94L,emp#u count#175,ref#u company#u size#u id#96,growth#u rate#191,growth#u percentile#193,avg#u percentile#196,avg#u percentile#197,(100.0*201)coast#double#200#
!分词(计数199L,emp计数,emp计数,94L,94L,,[ID355,85,id,85,href,86,姓名,87,姓名,姓名,87,emp计数,emp计数,94L,emp计数,emp计数,emp计数的百分位数175,参考公司公司大小的id,96年,96年,增长率,191年,增长率,增长率,增长率,增长率,百分位数,百分位数,百分位数,百分率率,百分位数,百分位数,193年年率,名称,姓名,87,姓名,姓名,姓名,87,姓名,姓名,姓名,87,87,87,emp,emp计数,emp计数,emp计数,emp计数,emp计数,emp计数,emp计数,emp计数,emp计数,emp计数,emp计数,emp计数,emp计数,emp计数,emp计数,emp计数,emp计数,emp计数,94例如,蟒蛇[201]
转换安全
TungstenProject[id#85,href#86,name#87,emp#u count#94L,emp#u count#175,ref(公司规模#id#96,增长率#191,增长率#百分位#193,平均任期#196,平均任期#百分位#197#199 l]
SortMergeOuterJoin[id#85],[company#id#180],LeftOuter,无
TungstenSort[id#85 ASC],错误,0
TungsteneExchange哈希分区(id#85)
TungstenProject[id#85,href#86,name#87,emp#u count#94L,emp#u count#175,ref#u company#u size#u id#96,growth#u rate#191,growth#u percentile#193,avg#u任期#196,(#we0#198 100.0)作为avg#任期357]
85号窗口(id)的85号窗口(id)85号窗口,10号窗口的85号窗口,8号窗口的86号窗口,8号窗口的86号窗口,8号窗口的85号窗口的85号窗口,85号窗口的85号窗口的85号窗口,85号窗口的85号窗口,85号窗口的85号窗口,85号窗口的85号窗口的85号窗口的85号窗口的85号窗口的85号窗口的85号窗口的85号窗口的85号窗口的85号窗口的85号窗口的85号窗口的85号窗口的85个百分位数的85号窗口的85号窗口,85号窗口的85号窗口的85号窗口的85号窗口的85号窗口的85号窗口的85号窗口的85号窗口的85号窗口的85号窗口的85号窗口的85,85号窗口的85号窗口公司大小公司的公司的大小大小大小大小id 85号85号85号85号85号85号85,85号85号85,85号85号窗口的85号85,we0#198],[参考公司规模id 96],[平均任期196 ASC]
TungstenSort[ref_company_size_id#96 ASC,平均任期#196 ASC],假,0
TungsteneExchange哈希分区(参考公司大小id 96)
TungstenProject[id#85,href#86,name#87,emp#u计数#94L,emp#u计数#百分比#175
  /home/hadoop/rantav.spark_normalize_data.py.093920/pymysql/cursors.py:146: Warning: Can't create database 'v2'; database exists
    result = self._query(query)
  /home/hadoop/rantav.spark_normalize_data.py.093920/pymysql/cursors.py:146: Warning: Table 'oplog' already exists
    result = self._query(query)
  Traceback (most recent call last):
    File "/home/hadoop/rantav.spark_normalize_data.py.093920/spark_normalize_data.py", line 102, in <module>
      run_spark(args)
    File "/home/hadoop/rantav.spark_normalize_data.py.093920/spark_normalize_data.py", line 62, in run_spark
      company_aliases_broadcast, experiences, args)
    File "/home/hadoop/rantav.spark_normalize_data.py.093920/companies.py", line 50, in get_companies
      out(sc, args, 'companies', sql_create_table, companies)
    File "/home/hadoop/rantav.spark_normalize_data.py.093920/output.py", line 35, in out
      if data.count() > 0:
    File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 268, in count
    File "/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
    File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 36, in deco
    File "/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
  py4j.protocol.Py4JJavaError: An error occurred while calling o469.count.
  : org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
  TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[count#216L])
   TungstenExchange SinglePartition
    TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[currentCount#219L])
     TungstenProject
      Window [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,churn_rate#200,churn_rate_percentile#202,retention_rate_2y#210], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentRank(retention_rate_2y#210) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _we0#213], [ref_company_size_id#96], [retention_rate_2y#210 ASC]
       TungstenSort [ref_company_size_id#96 ASC,retention_rate_2y#210 ASC], false, 0
        TungstenExchange hashpartitioning(ref_company_size_id#96)
         TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,churn_rate#200,churn_rate_percentile#202,retention_rate_2y#210]
          SortMergeOuterJoin [id#85], [company_id#180], LeftOuter, None
           TungstenSort [id#85 ASC], false, 0
            TungstenExchange hashpartitioning(id#85)
             TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,churn_rate#200,(_we0#203 * 100.0) AS churn_rate_percentile#202]
              Window [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,churn_rate#200], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentRank(churn_rate#200) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _we0#203], [ref_company_size_id#96], [churn_rate#200 ASC]
               TungstenSort [ref_company_size_id#96 ASC,churn_rate#200 ASC], false, 0
                TungstenExchange hashpartitioning(ref_company_size_id#96)
                 TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,(100.0 * cast(pythonUDF#201 as double)) AS churn_rate#200]
                  !BatchPythonEvaluation PythonUDF#divide(count#199L,emp_count#94L), [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,count#199L,pythonUDF#201]
                   ConvertToSafe
                    TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,count#199L]
                     SortMergeOuterJoin [id#85], [company_id#180], LeftOuter, None
                      TungstenSort [id#85 ASC], false, 0
                       TungstenExchange hashpartitioning(id#85)