Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/maven/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
PySpark超时异常_Pyspark_Google Cloud Dataproc - Fatal编程技术网

PySpark超时异常

PySpark超时异常,pyspark,google-cloud-dataproc,Pyspark,Google Cloud Dataproc,我在Google Dataproc上运行pySpark,我正在尝试大规模地使用网络图 这是我的配置 import pyspark from pyspark.sql import SparkSession conf = pyspark.SparkConf().setAll([('spark.jars', 'gs://spark-lib/bigquery/spark-bigquery-latest.jar'), ('spark.

我在Google Dataproc上运行pySpark,我正在尝试大规模地使用网络图

这是我的配置

import pyspark
from pyspark.sql import SparkSession

conf = pyspark.SparkConf().setAll([('spark.jars', 'gs://spark-lib/bigquery/spark-bigquery-latest.jar'),
                                   ('spark.jars.packages', 'graphframes:graphframes:0.7.0-spark2.3-s_2.11')])

spark = SparkSession.builder \
  .appName('testing bq')\
  .config(conf=conf) \
  .getOrCreate()
然而,当我从网络图上的图帧运行“标签传播”算法时,由于超时,它总是返回Py4JJavaError

result = g_df.labelPropagation(maxIter=5)
错误:

Py4JJavaError: An error occurred while calling o287.run.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 197.0 failed 4 times, most recent failure: Lost task 0.3 in stage 197.0 (TID 7247, cluster-network-graph-w-7.c.geotab-bi.internal, executor 50): ExecutorLostFailure (executor 50 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 127971 ms

如何从PySpark更改此超时参数?它会影响什么?

我认为它与默认为120秒的
spark.network.timeout
有关

所有网络交互的默认超时。如果未配置spark.core.connection.ack.wait.timeout、spark.storage.blockManagerSlaveTimeoutMs、spark.shuffle.io.connectionTimeout、spark.rpc.askTimeout或spark.rpc.lookupTimeout,则将使用此配置代替它们