带有GraphFrame的PySpark异常

带有GraphFrame的PySpark异常,pyspark,google-cloud-dataproc,graphframes,Pyspark,Google Cloud Dataproc,Graphframes,我正在用PySpark和GraphFrames(在googledataproc上运行)构建一个简单的网络图 然后,我尝试运行'label progation' result = g.labelPropagation(maxIter=5) 但我得到了以下错误: Py4JJavaError: An error occurred while calling o164.run. : org.apache.spark.SparkException: Job aborted due to stage fa

我正在用PySpark和GraphFrames(在googledataproc上运行)构建一个简单的网络图

然后,我尝试运行'label progation'

result = g.labelPropagation(maxIter=5)
但我得到了以下错误:

Py4JJavaError: An error occurred while calling o164.run.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 19.0 failed 4 times, most recent failure: Lost task 0.3 in stage 19.0 (TID 829, cluster-network-graph-w-12.c.myproject-bi.internal, executor 2): java.lang.ClassNotFoundException: org.graphframes.GraphFrame$$anonfun$5

看起来“GraphFrame”包不可用-但只有在我运行标签传播时才可用。如何修复它?

这似乎是google Dataproc中已知的图形框架问题

创建python文件并添加以下行,然后运行它:

from setuptools import setup

setup(name='graphframes',
version='0.5.10',
packages=['graphframes', 'graphframes.lib']
)
有关详细信息,请访问此网站:


我已使用以下参数求解

import pyspark
from pyspark.sql import SparkSession

conf = pyspark.SparkConf().setAll([('spark.jars', 'gs://spark-lib/bigquery/spark-bigquery-latest.jar'),
                                   ('spark.jars.packages', 'graphframes:graphframes:0.7.0-spark2.3-s_2.11')])

spark = SparkSession.builder \
  .appName('testing bq')\
  .config(conf=conf) \
  .getOrCreate()

运行时路径中是否存在graphframes jar?
import pyspark
from pyspark.sql import SparkSession

conf = pyspark.SparkConf().setAll([('spark.jars', 'gs://spark-lib/bigquery/spark-bigquery-latest.jar'),
                                   ('spark.jars.packages', 'graphframes:graphframes:0.7.0-spark2.3-s_2.11')])

spark = SparkSession.builder \
  .appName('testing bq')\
  .config(conf=conf) \
  .getOrCreate()