使用IPython 1.2.0中的Python2.6.6在纱线上喷洒Spark1.6.0-碰撞_Python_Hadoop_Ipython_Pyspark_Jupyter

使用IPython 1.2.0中的Python2.6.6在纱线上喷洒Spark1.6.0-碰撞

python hadoop ipython pyspark

使用IPython 1.2.0中的Python2.6.6在纱线上喷洒Spark1.6.0-碰撞,python,hadoop,ipython,pyspark,jupyter,Python,Hadoop,Ipython,Pyspark,Jupyter,我正在尝试设置一个JupyterHub实例，以连接到Hadoop纱线集群来运行PypSpark作业。不幸的是，Hadoop集群只有Python2.6.6，因此我也必须运行Python2.6，因此我不能使用IPython的最新版本，只有版本1.2（IPython的最新版本仅在Python2.7+上运行）我设置了一切，但IPython内核不断崩溃并重新启动以下是设置： IPython内核： { "display_name": "PySpark-1.6.0 - Python2.6.6", "

我正在尝试设置一个JupyterHub实例，以连接到Hadoop纱线集群来运行PypSpark作业。不幸的是，Hadoop集群只有Python2.6.6，因此我也必须运行Python2.6，因此我不能使用IPython的最新版本，只有版本1.2（IPython的最新版本仅在Python2.7+上运行）

我设置了一切，但IPython内核不断崩溃并重新启动

以下是设置： IPython内核：

{
 "display_name": "PySpark-1.6.0 - Python2.6.6",
 "language": "python",
 "argv": [
  "/usr/local/bin/python2.6",
  "-m",
  "IPython.__main__",
  "--config={connection_file}"
 ],
 "env": {
  "SPARK_HOME": "/usr/lib/spark-1.6.0-bin-without-hadoop",
  "SCALA_HOME": "/usr/lib/scala",
  "HADOOP_USER_NAME": "myuser",
  "HADOOP_CONF_DIR": "/usr/lib/spark-1.6.0-bin-without-hadoop/yarn-conf",
  "HADOOP_HOME": "/usr/bin/hadoop",
  "YARN_HOME": "",
  "SPARK_DIST_CLASSPATH": "/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*",
  "SPARK_LIBRARY_PATH": "/usr/lib/hadoop/lib",
  "SPARK_CLASSPATH": "/usr/lib/hadoop/lib",
  "PATH": "/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/apps/home/zoltan.fedor/.local/bin:/apps/home/zoltan.fedor/bin:/usr/bin/hadoop/bin",
  "PYTHONPATH": "/usr/lib/spark-1.6.0-bin-without-hadoop/python/:/usr/lib/spark-1.6.0-bin-without-hadoop/python/lib/py4j-0.8.2.1-src.zip",
  "PYTHONSTARTUP": "/usr/lib/spark-1.6.0-bin-without-hadoop/python/pyspark/shell.py",
  "PYSPARK_SUBMIT_ARGS": "--master yarn --deploy-mode client --jars /usr/lib/avro/avro-mapred.jar,/usr/lib/spark-1.6.0-bin-without-hadoop/lib/spark-examples-1.6.0-hadoop2.2.0.jar pyspark-shell",
  "SPARK_YARN_USER_ENV": "PYTHONPATH=/usr/lib/spark-1.6.0-bin-without-hadoop/python/:/usr/lib/spark-1.6.0-bin-without-hadoop/python/lib/py4j-0.8.2.1-src.zip"
 }
}

下面是ipykernel日志，至少一次迭代。它只是在崩溃时重新启动时不断重复：

[I 2016-01-20 20:20:34.336 zoltan.fedor restarter:103] KernelRestarter: restarting kernel (1/5)
WARNING:root:kernel 216d7905-1d93-4052-b5b6-3aee9d3628b1 restarted
/usr/local/lib/python2.6/site-packages/path.py:1719: DeprecationWarning: path is deprecated. Use Path instead.
  warnings.warn(msg, DeprecationWarning)
Python 2.6.6 (r266:84292, Jan 20 2016, 18:10:40) 
Type "copyright", "credits" or "license" for more information.

IPython 1.2.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
16/01/20 20:20:35 INFO spark.SparkContext: Running Spark version 1.6.0
16/01/20 20:20:35 WARN spark.SparkConf: 
SPARK_CLASSPATH was detected (set to '/usr/lib/hadoop/lib').
This is deprecated in Spark 1.0+.

Please instead use:
 - ./spark-submit with --driver-class-path to augment the driver classpath
 - spark.executor.extraClassPath to augment the executor classpath

16/01/20 20:20:35 WARN spark.SparkConf: Setting 'spark.executor.extraClassPath' to '/usr/lib/hadoop/lib' as a work-around.
16/01/20 20:20:35 WARN spark.SparkConf: Setting 'spark.driver.extraClassPath' to '/usr/lib/hadoop/lib' as a work-around.
16/01/20 20:20:35 INFO spark.SecurityManager: Changing view acls to: zoltan.fedor,myuser
16/01/20 20:20:35 INFO spark.SecurityManager: Changing modify acls to: zoltan.fedor,myuser
16/01/20 20:20:35 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zoltan.fedor, myuser); users with modify permissions: Set(zoltan.fedor, myuser)
16/01/20 20:20:36 INFO util.Utils: Successfully started service 'sparkDriver' on port 41092.
16/01/20 20:20:36 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/01/20 20:20:36 INFO Remoting: Starting remoting
16/01/20 20:20:36 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.xxx.xxx.xxx:41725]
16/01/20 20:20:36 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 41725.
16/01/20 20:20:36 INFO spark.SparkEnv: Registering MapOutputTracker
16/01/20 20:20:36 INFO spark.SparkEnv: Registering BlockManagerMaster
16/01/20 20:20:36 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-b68b3c8c-c43b-4e3d-9efd-6534f7eddcaf
16/01/20 20:20:36 INFO storage.MemoryStore: MemoryStore started with capacity 511.1 MB
16/01/20 20:20:36 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/01/20 20:20:36 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/01/20 20:20:36 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
16/01/20 20:20:36 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
16/01/20 20:20:36 INFO ui.SparkUI: Started SparkUI at http://10.xx.xx.xx:4040
16/01/20 20:20:36 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-ba928a28-64a6-4c70-b4cf-8c348140e89e/httpd-7892e967-247d-469e-96fb-79ad9269e569
16/01/20 20:20:36 INFO spark.HttpServer: Starting HTTP Server
16/01/20 20:20:36 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/01/20 20:20:36 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:43883
16/01/20 20:20:36 INFO util.Utils: Successfully started service 'HTTP file server' on port 43883.
16/01/20 20:20:36 INFO spark.SparkContext: Added JAR file:/usr/lib/avro/avro-mapred.jar at http://10.xx.xx.xx:43883/jars/avro-mapred.jar with timestamp 1453321236776
16/01/20 20:20:36 INFO spark.SparkContext: Added JAR file:/usr/lib/spark-1.6.0-bin-without-hadoop/lib/spark-examples-1.6.0-hadoop2.2.0.jar at http://10.xxx.xxx.xxx:43883/jars/spark-examples-1.6.0-hadoop2.2.0.jar with timestamp 1453321236887
16/01/20 20:20:37 INFO yarn.Client: Requesting a new application from cluster with 106 NodeManagers
16/01/20 20:20:37 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (192512 MB per container)
16/01/20 20:20:37 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/01/20 20:20:37 INFO yarn.Client: Setting up container launch context for our AM
16/01/20 20:20:37 INFO yarn.Client: Setting up the launch environment for our AM container
16/01/20 20:20:37 INFO yarn.Client: Preparing resources for our AM container
16/01/20 20:20:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/20 20:20:37 INFO yarn.Client: Uploading resource file:/usr/lib/spark-1.6.0-bin-without-hadoop/lib/spark-assembly-1.6.0-hadoop2.2.0.jar -> hdfs://Cluster/user/myuser/.sparkStaging/application_1453104838332_22373/spark-assembly-1.6.0-hadoop2.2.0.jar
16/01/20 20:20:38 INFO yarn.Client: Uploading resource file:/usr/lib/spark-1.6.0-bin-without-hadoop/python/lib/pyspark.zip -> hdfs://Cluster/user/myuser/.sparkStaging/application_1453104838332_22373/pyspark.zip
16/01/20 20:20:38 INFO yarn.Client: Uploading resource file:/usr/lib/spark-1.6.0-bin-without-hadoop/python/lib/py4j-0.9-src.zip -> hdfs://Cluster/user/myuser/.sparkStaging/application_1453104838332_22373/py4j-0.9-src.zip
16/01/20 20:20:38 INFO yarn.Client: Uploading resource file:/tmp/spark-ba928a28-64a6-4c70-b4cf-8c348140e89e/__spark_conf__4853045545116393897.zip -> hdfs://Cluster/user/myuser/.sparkStaging/application_1453104838332_22373/__spark_conf__4853045545116393897.zip
16/01/20 20:20:38 INFO spark.SecurityManager: Changing view acls to: zoltan.fedor,myuser
16/01/20 20:20:38 INFO spark.SecurityManager: Changing modify acls to: zoltan.fedor,myuser
16/01/20 20:20:38 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zoltan.fedor, myuser); users with modify permissions: Set(zoltan.fedor, myuser)
16/01/20 20:20:38 INFO yarn.Client: Submitting application 22373 to ResourceManager
16/01/20 20:20:38 INFO impl.YarnClientImpl: Submitted application application_1453104838332_22373
16/01/20 20:20:39 INFO yarn.Client: Application report for application_1453104838332_22373 (state: ACCEPTED)
16/01/20 20:20:39 INFO yarn.Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: root.myuser
     start time: 1453321238567
     final status: UNDEFINED
     tracking URL: http://c14xx.xx.xxx.com:8088/proxy/application_1453104838332_22373/
     user: myuser
16/01/20 20:20:40 INFO yarn.Client: Application report for application_1453104838332_22373 (state: ACCEPTED)
16/01/20 20:20:41 INFO yarn.Client: Application report for application_1453104838332_22373 (state: ACCEPTED)
16/01/20 20:20:42 INFO yarn.Client: Application report for application_1453104838332_22373 (state: ACCEPTED)
16/01/20 20:20:43 INFO yarn.Client: Application report for application_1453104838332_22373 (state: ACCEPTED)
16/01/20 20:20:44 INFO yarn.Client: Application report for application_1453104838332_22373 (state: ACCEPTED)
16/01/20 20:20:45 INFO yarn.Client: Application report for application_1453104838332_22373 (state: ACCEPTED)
16/01/20 20:20:46 INFO yarn.Client: Application report for application_1453104838332_22373 (state: ACCEPTED)
16/01/20 20:20:47 INFO yarn.Client: Application report for application_1453104838332_22373 (state: ACCEPTED)
16/01/20 20:20:48 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
16/01/20 20:20:48 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> c14xx.xxx.xxx.com,c141xx.xxx.xxx.com, PROXY_URI_BASES -> http://c14xx.xxx.xxx.com:8088/proxy/application_1453104838332_22373,http://c141xx.xxx.xxx.com:8088/proxy/application_1453104838332_22373), /proxy/application_1453104838332_22373
16/01/20 20:20:48 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
16/01/20 20:20:48 INFO yarn.Client: Application report for application_1453104838332_22373 (state: ACCEPTED)
16/01/20 20:20:49 INFO yarn.Client: Application report for application_1453104838332_22373 (state: RUNNING)
16/01/20 20:20:49 INFO yarn.Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: 10.xx.xx.xx
     ApplicationMaster RPC port: 0
     queue: root.myuser
     start time: 1453321238567
     final status: UNDEFINED
     tracking URL: http:/c14xx.xxx.xxx.com:8088/proxy/application_1453104838332_22373/
     user: myuser
16/01/20 20:20:49 INFO cluster.YarnClientSchedulerBackend: Application application_1453104838332_22373 has started running.
16/01/20 20:20:49 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38421.
16/01/20 20:20:49 INFO netty.NettyBlockTransferService: Server created on 38421
16/01/20 20:20:49 INFO storage.BlockManagerMaster: Trying to register BlockManagerxx.xx.xx, 38421)
16/01/20 20:20:49 INFO storage.BlockManagerMaster: Registered BlockManager
16/01/20 20:20:58 INFO cluster.YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (c25xx.xx.xx.com:43637) with ID 1
16/01/20 20:20:58 INFO storage.BlockManagerMasterEndpoint: Registering block manager c25xx.xx.xx.com:42471 with 511.1 MB RAM, BlockManagerId(1, c25xx.xx.xx.com, 42471)
16/01/20 20:20:59 INFO cluster.YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (c214xx.xxx.xxx.com:59623) with ID 2
16/01/20 20:20:59 INFO storage.BlockManagerMasterEndpoint: Registering block manager c214xx.xxx.xxx.com.com:45327 with 511.1 MB RAM, BlockManagerId(2, c214xx.xxx.xxx.com.com, 45327)
16/01/20 20:20:59 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.6.0
      /_/

Using Python version 2.6.6 (r266:84292, Jan 20 2016 18:10:40)
SparkContext available as sc, SQLContext available as sqlContext.

In [1]: 
Do you really want to exit ([y]/n)? 
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
16/01/20 20:20:59 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
16/01/20 20:20:59 INFO ui.SparkUI: Stopped Spark web UI at http://10.xx.xx.xx:4040
16/01/20 20:20:59 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
16/01/20 20:20:59 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
16/01/20 20:20:59 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down
16/01/20 20:20:59 INFO cluster.YarnClientSchedulerBackend: Stopped
16/01/20 20:20:59 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/01/20 20:20:59 INFO storage.MemoryStore: MemoryStore cleared
16/01/20 20:20:59 INFO storage.BlockManager: BlockManager stopped
16/01/20 20:20:59 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
16/01/20 20:20:59 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/01/20 20:20:59 INFO spark.SparkContext: Successfully stopped SparkContext
16/01/20 20:20:59 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/01/20 20:20:59 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/01/20 20:20:59 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
16/01/20 20:20:59 INFO util.ShutdownHookManager: Shutdown hook called
16/01/20 20:20:59 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-ba928a28-64a6-4c70-b4cf-8c348140e89e/httpd-7892e967-247d-469e-96fb-79ad9269e569
16/01/20 20:20:59 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-ba928a28-64a6-4c70-b4cf-8c348140e89e
16/01/20 20:20:59 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-ba928a28-64a6-4c70-b4cf-8c348140e89e/pyspark-a2f9af7d-c218-4a0c-ac8a-bfe446fbc754
[I 2016-01-20 20:21:01.344 zoltan.fedor restarter:103] KernelRestarter: restarting kernel (1/5)

然后开始重复上述步骤。此外，当它崩溃时（写出Spark版本），它还会显示一些在上述日志中看不到的不可显示字符

知道为什么会崩溃吗？

为什么需要jupyterhub连接到hadoop群集？我希望数据科学家使用PypSpark在群集上运行Spark代码，将Spark返回的数据返回到笔记本，从本地数据库加载其他数据，等等。jupyterhub非常适合按需数据分析，这正是Spark的优势所在。因此，我们希望使用Jupyterhub对Spark群集进行数据分析。这是一个其他人也强调过的有效用例。