Pyspark 使用'时出错;textFile.count()';

Pyspark 使用'时出错;textFile.count()';,pyspark,Pyspark,我不知道问题出在哪里,也不知道如何解决。spark版本是2.1.0,python版本是3.4.6 / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.1.0 /_/ Using Python version 3.4.6 (default, Mar 9 2017 19:57:54) SparkSession available as 'spar

我不知道问题出在哪里,也不知道如何解决。spark版本是2.1.0,python版本是3.4.6

   / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.1.0
      /_/

Using Python version 3.4.6 (default, Mar  9 2017 19:57:54)
SparkSession available as 'spark'.
  • ##这是作为官方文件的命令i代码

  • 但这不是工作。
    回溯(最近一次呼叫最后一次):
    文件“”,第1行,在
    文件“/usr/local/spark/python/pyspark/rdd.py”,第1041行,计数
    返回self.mapPartitions(lambda i:[sum(i中的u为1)]).sum()
    文件“/usr/local/spark/python/pyspark/rdd.py”,第1032行,总计
    返回self.mapPartitions(lambda x:[求和(x)]).fold(0,运算符.add)
    文件“/usr/local/spark/python/pyspark/rdd.py”,第906行,折叠
    vals=self.mapPartitions(func.collect())
    文件“/usr/local/spark/python/pyspark/rdd.py”,第809行,在collect中
    port=self.ctx.\u jvm.PythonRDD.collectAndServe(self.\u jrdd.rdd())
    文件“/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py”,第1133行,在__
    文件“/usr/local/spark/python/pyspark/sql/utils.py”,第63行,deco格式
    返回f(*a,**kw)
    文件“/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py”,第319行,在get_return_值中
    py4j.protocol.Py4JJavaError:调用z:org.apache.spark.api.python.PythonRDD.collectAndServe时出错。
    :java.net.ConnectException:从hadoop/192.168.81.129调用localhost:9000失败,连接异常:java.net.ConnectException:拒绝连接; 有关更多详细信息,请参阅:http://wiki.apache.org/hadoop/ConnectionRefused
    位于sun.reflect.NativeConstructorAccessorImpl.newInstance0(本机方法)
    位于sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    在sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    位于java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    位于org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
    位于org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
    位于org.apache.hadoop.ipc.Client.call(Client.java:1479)
    位于org.apache.hadoop.ipc.Client.call(Client.java:1412)
    位于org.apache.hadoop.ipc.protobufrpceengine$Invoker.invoke(protobufrpceengine.java:229)
    位于com.sun.proxy.$Proxy19.getFileInfo(未知源)
    位于org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
    在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)处
    在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)中
    在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)中
    位于java.lang.reflect.Method.invoke(Method.java:606)
    位于org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    位于org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    位于com.sun.proxy.$Proxy20.getFileInfo(未知源)
    位于org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
    位于org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
    位于org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
    位于org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    位于org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
    位于org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
    位于org.apache.hadoop.fs.Globber.glob(Globber.java:252)
    位于org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1674)
    位于org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:259)
    位于org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
    位于org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
    位于org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
    位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:252)
    位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:250)
    位于scala.Option.getOrElse(Option.scala:121)
    位于org.apache.spark.rdd.rdd.partitions(rdd.scala:250)
    位于org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:252)
    位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:250)
    位于scala.Option.getOrElse(Option.scala:121)
    位于org.apache.spark.rdd.rdd.partitions(rdd.scala:250)
    位于org.apache.spark.api.python.PythonRDD.getPartitions(PythonRDD.scala:53)
    位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:252)
    位于org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:250)
    位于scala.Option.getOrElse(Option.scala:121)
    位于org.apache.spark.rdd.rdd.partitions(rdd.scala:250)
    位于org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
    位于org.apache.spark.rdd.rdd$$anonfun$collect$1.apply(rdd.scala:935)
    位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    位于org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    位于org.apache.spark.rdd.rdd.withScope(rdd.scala:362)
    位于org.apache.spark.rdd.rdd.collect(rdd.scala:934)
    位于org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:453)
    位于org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
    在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)处
    在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)中
    在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)中
    位于java.lang.reflect.Method.invoke(Method.java:606)
    位于py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    位于py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    在py4j.Gateway.invoke处(Gateway.java:280)
    在py4j.commands.AbstractCommand.i
    
    >>> input_data = sc.textFile('my python')
    >>> input_data.count()
    
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/spark/python/pyspark/rdd.py", line 1041, in count
        return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
      File "/usr/local/spark/python/pyspark/rdd.py", line 1032, in sum
        return self.mapPartitions(lambda x: [sum(x)]).fold(0, operator.add)
      File "/usr/local/spark/python/pyspark/rdd.py", line 906, in fold
        vals = self.mapPartitions(func).collect()
      File "/usr/local/spark/python/pyspark/rdd.py", line 809, in collect
        port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
      File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
      File "/usr/local/spark/python/pyspark/sql/utils.py", line 63, in deco
        return f(*a, **kw)
      File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
    py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
    : java.net.ConnectException: Call From hadoop/192.168.81.129 to localhost:9000 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
        at org.apache.hadoop.ipc.Client.call(Client.java:1479)
        at org.apache.hadoop.ipc.Client.call(Client.java:1412)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
        at com.sun.proxy.$Proxy19.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy20.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
        at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
        at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1674)
        at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:259)
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
        at org.apache.spark.api.python.PythonRDD.getPartitions(PythonRDD.scala:53)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:934)
        at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:453)
        at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:745)
    Caused by: java.net.ConnectException: 拒绝连接
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
        at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
        at org.apache.hadoop.ipc.Client.call(Client.java:1451)
        ... 56 more
    
    >>>