Pyspark 导入错误:无法导入名称';SparkContext';

Pyspark 导入错误:无法导入名称';SparkContext';,pyspark,Pyspark,我尝试在Windows上运行pyspark的一个示例,并做了一些事情,比如设置PYTHONPATH env变量。但是我仍然得到了thois错误 我的蟒蛇像这样: C:\spark\spark-2.3.4-bin-hadoop2.7\python;C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib;C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip; Traceback (mo

我尝试在Windows上运行pyspark的一个示例,并做了一些事情,比如设置PYTHONPATH env变量。但是我仍然得到了thois错误

我的蟒蛇像这样:

C:\spark\spark-2.3.4-bin-hadoop2.7\python;C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib;C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip;

Traceback (most recent call last):
  File "C:\Users\Chris\Anaconda3\lib\runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "C:\Users\Chris\Anaconda3\lib\runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "C:\Users\Chris\Projekte\pyspark\pyspark.py", line 25, in <module>
    from pyspark import SparkContext, SparkConf
ImportError: cannot import name 'SparkContext'
[Stage 53:>                                                       (0 + 1) / 200]2019-10-11 18:25:36 ERROR Executor:91 - Exception in task 0.0 in stage 53.0 (TID 451)
org.apache.spark.SparkException: Python worker failed to connect back.
        at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:148)
        at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:76)
        at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
        at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:87)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:67)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.net.SocketTimeoutException: Accept timed out
        at java.net.DualStackPlainSocketImpl.waitForNewConnection(Native Method)
        at java.net.DualStackPlainSocketImpl.socketAccept(Unknown Source)
        at java.net.AbstractPlainSocketImpl.accept(Unknown Source)
        at java.net.PlainSocketImpl.accept(Unknown Source)
        at java.net.ServerSocket.implAccept(Unknown Source)
        at java.net.ServerSocket.accept(Unknown Source)
        at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:142)
        ... 16 more
2019-10-11 18:25:36 WARN  TaskSetManager:66 - Lost task 0.0 in stage 53.0 (TID 451, localhost, executor driver): org.apache.spark.SparkException: Python worker failed to connect back.
        at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:148)
        at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:76)
        at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
        at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:87)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:67)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.net.SocketTimeoutException: Accept timed out
        at java.net.DualStackPlainSocketImpl.waitForNewConnection(Native Method)
        at java.net.DualStackPlainSocketImpl.socketAccept(Unknown Source)
        at java.net.AbstractPlainSocketImpl.accept(Unknown Source)
        at java.net.PlainSocketImpl.accept(Unknown Source)
        at java.net.ServerSocket.implAccept(Unknown Source)
        at java.net.ServerSocket.accept(Unknown Source)
        at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:142)
        ... 16 more

2019-10-11 18:25:36 ERROR TaskSetManager:70 - Task 0 in stage 53.0 failed 1 times; aborting job
Traceback (most recent call last):
  File "pyspark.py", line 237, in <module>
    label='label_index');
  File "pyspark.py", line 211, in dl_pipeline_fit_score_results
    fit_dl_pipeline = dl_pipeline.fit(train_data)
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\pyspark\ml\base.py", line 132, in fit
    return self._fit(dataset)
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\pyspark\ml\pipeline.py", line 109, in _fit
    model = stage.fit(dataset)
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\pyspark\ml\base.py", line 132, in fit
    return self._fit(dataset)
  File "C:\Users\Chris\Anaconda3\lib\site-packages\elephas\ml_model.py", line 92, in _fit
    validation_split=self.get_validation_split())
  File "C:\Users\Chris\Anaconda3\lib\site-packages\elephas\spark_model.py", line 151, in fit
    self._fit(rdd, epochs, batch_size, verbose, validation_split)
  File "C:\Users\Chris\Anaconda3\lib\site-packages\elephas\spark_model.py", line 188, in _fit
    gradients = rdd.mapPartitions(worker.train).collect()
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\pyspark\rdd.py", line 814, in collect
    sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1257, in __call__
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\pyspark\sql\utils.py", line 63, in deco
    return f(*a, **kw)
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 53.0 failed 1 times, most recent failure: Lost task 0.0 in stage 53.0 (TID 451, localhost, executor driver): org.apache.spark.SparkException: Python worker failed to connect back.
        at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:148)
        at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:76)
        at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
        at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:87)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:67)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.net.SocketTimeoutException: Accept timed out
        at java.net.DualStackPlainSocketImpl.waitForNewConnection(Native Method)
        at java.net.DualStackPlainSocketImpl.socketAccept(Unknown Source)
        at java.net.AbstractPlainSocketImpl.accept(Unknown Source)
        at java.net.PlainSocketImpl.accept(Unknown Source)
        at java.net.ServerSocket.implAccept(Unknown Source)
        at java.net.ServerSocket.accept(Unknown Source)
        at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:142)
        ... 16 more

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1661)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1649)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1648)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1648)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1882)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1831)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1820)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2074)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
        at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:165)
        at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.spark.SparkException: Python worker failed to connect back.
        at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:148)
        at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:76)
        at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
        at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:87)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:67)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        ... 1 more
Caused by: java.net.SocketTimeoutException: Accept timed out
        at java.net.DualStackPlainSocketImpl.waitForNewConnection(Native Method)
        at java.net.DualStackPlainSocketImpl.socketAccept(Unknown Source)
        at java.net.AbstractPlainSocketImpl.accept(Unknown Source)
        at java.net.PlainSocketImpl.accept(Unknown Source)
        at java.net.ServerSocket.implAccept(Unknown Source)
        at java.net.ServerSocket.accept(Unknown Source)
        at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:142)
        ... 16 more

C:\Java\Java8
C:\spark\spark-2.3.4-bin-hadoop2.7
C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\pyspark.zip;C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip;/C:/spark/spark-2.3.4-bin-hadoop2.7/jars/spark-core_2.11-2.3.4.jar;C:\spark\spark-2.3.4-bin-hadoop2.7\python;C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib;C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip;
Traceback (most recent call last):
  File "C:\Users\Chris\Anaconda3\lib\runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "C:\Users\Chris\Anaconda3\lib\runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "C:\Users\Chris\Projekte\pyspark\pyspark.py", line 25, in <module>
    from pyspark import SparkContext, SparkConf
ImportError: cannot import name 'SparkContext'
ERFOLGREICH: Der Prozess mit PID 3972 (untergeordnetem Prozess von PID 2768) wurde beendet.
ERFOLGREICH: Der Prozess mit PID 2768 (untergeordnetem Prozess von PID 14128) wurde beendet.
ERFOLGREICH: Der Prozess mit PID 14128 (untergeordnetem Prozess von PID 7464) wurde beendet.
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1152, in send_command
  File "C:\Users\Chris\Anaconda3\lib\socket.py", line 586, in readinto
    return self._sock.recv_into(b)
ConnectionResetError: [WinError 10054] Eine vorhandene Verbindung wurde vom Remotehost geschlossen

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 985, in send_command
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1164, in send_command
py4j.protocol.Py4JNetworkError: Error while receiving
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:52283)
Traceback (most recent call last):
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1145, in send_command
ConnectionResetError: [WinError 10054] Eine vorhandene Verbindung wurde vom Remotehost geschlossen

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 985, in send_command
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1149, in send_command
py4j.protocol.Py4JNetworkError: Error while sending

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 929, in _get_connection
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1067, in start
ConnectionRefusedError: [WinError 10061] Es konnte keine Verbindung hergestellt werden, da der Zielcomputer die Verbindung verweigerte
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:52283)
Traceback (most recent call last):
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 929, in _get_connection
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1067, in start
ConnectionRefusedError: [WinError 10061] Es konnte keine Verbindung hergestellt werden, da der Zielcomputer die Verbindung verweigerte
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:52283)
Traceback (most recent call last):
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 929, in _get_connection
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1067, in start
ConnectionRefusedError: [WinError 10061] Es konnte keine Verbindung hergestellt werden, da der Zielcomputer die Verbindung verweigerte
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:52283)
Traceback (most recent call last):
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 929, in _get_connection
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1067, in start
ConnectionRefusedError: [WinError 10061] Es konnte keine Verbindung hergestellt werden, da der Zielcomputer die Verbindung verweigerte
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:52283)
Traceback (most recent call last):
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 929, in _get_connection
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1067, in start
ConnectionRefusedError: [WinError 10061] Es konnte keine Verbindung hergestellt werden, da der Zielcomputer die Verbindung verweigerte
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:52283)
Traceback (most recent call last):
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 929, in _get_connection
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1067, in start
ConnectionRefusedError: [WinError 10061] Es konnte keine Verbindung hergestellt werden, da der Zielcomputer die Verbindung verweigerte
C:\spark\spark-2.3.4-bin-hadoop2.7\python;C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib;C:\spark\spark-2.3.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip;
回溯(最近一次呼叫最后一次):
文件“C:\Users\Chris\Anaconda3\lib\runpy.py”,第183行,位于\u run\u模块\u作为\u main
模块名称,模块规格,代码=\u获取模块详细信息(模块名称,错误)
文件“C:\Users\Chris\Anaconda3\lib\runpy.py”,第109行,在获取模块详细信息中
__导入(包装名称)
文件“C:\Users\Chris\Projekte\pyspark\pyspark.py”,第25行,在
从pyspark导入SparkContext,SparkConf
ImportError:无法导入名称“SparkContext”
[第53阶段:>(0+1)/200]2019-10-11 18:25:36错误执行者:91-第53.0阶段(TID 451)任务0.0中的异常
org.apache.spark.SparkException:Python工作者无法连接回。
位于org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:148)
位于org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:76)
位于org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
位于org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:87)
位于org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:67)
在org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:324)
位于org.apache.spark.rdd.rdd.iterator(rdd.scala:288)
在org.apache.spark.rdd.MapPartitionsRDD.compute上(MapPartitionsRDD.scala:49)
在org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:324)
位于org.apache.spark.rdd.rdd.iterator(rdd.scala:288)
在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)上
在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)上
位于org.apache.spark.scheduler.Task.run(Task.scala:109)
位于org.apache.spark.executor.executor$TaskRunner.run(executor.scala:345)
位于java.util.concurrent.ThreadPoolExecutor.runWorker(未知源)
位于java.util.concurrent.ThreadPoolExecutor$Worker.run(未知源)
位于java.lang.Thread.run(未知源)
原因:java.net.SocketTimeoutException:接受超时
位于java.net.DualStackPlainSocketImpl.waitForNewConnection(本机方法)
位于java.net.DualStackPlainSocketImpl.socketAccept(未知源)
位于java.net.AbstractPlainSocketImpl.accept(未知源)
位于java.net.PlainSocketImpl.accept(未知源)
位于java.net.ServerSocket.implacpt(未知源)
位于java.net.ServerSocket.accept(未知源)
位于org.apache.spark.api.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:142)
... 还有16个
2019-10-11 18:25:36警告TaskSetManager:66-在53.0阶段(TID 451,localhost,executor driver)丢失了任务0.0:org.apache.spark.SparkException:Python worker无法重新连接。
位于org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:148)
位于org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:76)
位于org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
位于org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:87)
位于org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:67)
在org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:324)
位于org.apache.spark.rdd.rdd.iterator(rdd.scala:288)
在org.apache.spark.rdd.MapPartitionsRDD.compute上(MapPartitionsRDD.scala:49)
在org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:324)
位于org.apache.spark.rdd.rdd.iterator(rdd.scala:288)
在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)上
在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)上
位于org.apache.spark.scheduler.Task.run(Task.scala:109)
位于org.apache.spark.executor.executor$TaskRunner.run(executor.scala:345)
位于java.util.concurrent.ThreadPoolExecutor.runWorker(未知源)
位于java.util.concurrent.ThreadPoolExecutor$Worker.run(未知源)
位于java.lang.Thread.run(未知源)
原因:java.net.SocketTimeoutException:接受超时
位于java.net.DualStackPlainSocketImpl.waitForNewConnection(本机方法)
位于java.net.DualStackPlainSocketImpl.socketAccept(未知源)
位于java.net.AbstractPlainSocketImpl.accept(未知源)
位于java.net.PlainSocketImpl.accept(未知源)
位于java.net.ServerSocket.implacpt(未知源)
位于java.net.ServerSocket.accept(未知源)
位于org.apache.spark.api.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:142)
... 还有16个
2019-10-11 18:25:36错误TaskSetManager:70-阶段53.0中的任务0失败1次;中止工作
回溯(最近一次呼叫最后一次):
文件“pyspark.py”,第237行,在
label='label_index');
文件“pyspark.py”,第211行,在dl_管道_拟合_分数_结果中
fit_dl_pipeline=dl_pipeline.fit(列车数据)
文件“C:\spark\spark-2.3.4-bin-hadoop2.7\python\pyspark\ml\base.py”,第132行,以适合的形式
返回自拟合(数据集)
文件“C:\spark\spark-2.3.4-bin-hadoop2.7\python\pyspark\ml\pipeline.py”,第109行,以_-fit格式
model=stage.fit(数据集)
文件“C:\spark\spark-2.3.4-bin-hadoop2.7\python\pyspark\ml\base.py”,第132行,以适合的形式
返回自拟合(数据集)
文件“C:\Users\Chris\Anaconda3\lib\site packages\elephas\ml\u model.py”,第92行,in\u fit
validation\u split=self.get\u validation\u split())
文件“C:\Users\Chris\Anaconda3\lib\site packages\elephas\spark\u model.py”,第151行,fit
自适配(rdd,
import findspark
findspark.init('/home/ubuntu/spark-2.4.4-bin-hadoop2.7')
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('my_example').getOrCreate()
print(pyspark.SparkContext)
#<class 'pyspark.context.SparkContext'>