Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 齐柏林飞艇/Spark:org.apache.Spark.SparkException:无法运行程序/usr/bin/“:错误=13,没有权限_Apache Spark_Pyspark_Apache Zeppelin - Fatal编程技术网

Apache spark 齐柏林飞艇/Spark:org.apache.Spark.SparkException:无法运行程序/usr/bin/“:错误=13,没有权限

Apache spark 齐柏林飞艇/Spark:org.apache.Spark.SparkException:无法运行程序/usr/bin/“:错误=13,没有权限,apache-spark,pyspark,apache-zeppelin,Apache Spark,Pyspark,Apache Zeppelin,我尝试在Debian 9上使用齐柏林飞艇0.7.2和Spark 2.1.1进行基本的回归运行。两个齐柏林飞艇都“安装”在/usr/local/中,即/usr/local/zeppelin/和/usr/local/spark。齐柏林飞艇也知道正确的星火之家。首先,我加载数据: %spark.pyspark from sqlalchemy import create_engine #sql query import pandas as pd #sql query from pyspark impor

我尝试在Debian 9上使用齐柏林飞艇0.7.2和Spark 2.1.1进行基本的回归运行。两个齐柏林飞艇都“安装”在/usr/local/中,即/usr/local/zeppelin/和/usr/local/spark。齐柏林飞艇也知道正确的星火之家。首先,我加载数据:

%spark.pyspark
from sqlalchemy import create_engine #sql query
import pandas as pd #sql query
from pyspark import SparkContext #Spark DataFrame
from pyspark.sql import SQLContext #Spark DataFrame

# database connection and sql query
pdf = pd.read_sql("select col1, col2, col3 from table", create_engine('mysql+mysqldb://user:pass@host:3306/db').connect())

print(pdf.size) # size of pandas dataFrame

# convert pandas dataFrame into spark dataFrame
sdf = SQLContext(SparkContext.getOrCreate()).createDataFrame(pdf)

sdf.printSchema()# what does the spark dataFrame look like?
很好,它可以工作,我得到46977行和三列的输出:

46977
root
 |-- col1: double (nullable = true)
 |-- col2: double (nullable = true)
 |-- col3: date (nullable = true)
好的,现在我想做回归:

%spark.pyspark
# do a linear regression with sparks ml libs
# https://community.intersystems.com/post/machine-learning-spark-and-cach%C3%A9
from pyspark.ml.regression import LinearRegression
from pyspark.ml.feature import VectorAssembler

# choose several inputCols and transform the "Features" column(s) into the correct vector format
vectorAssembler = VectorAssembler(inputCols=["col1"], outputCol="features")
data=vectorAssembler.transform(sdf)
print(data)

# Split the data into 70% training and 30% test sets.
trainingData,testData = data.randomSplit([0.7, 0.3], 0.0)
print(trainingData)

# Configure the model.
lr = LinearRegression().setFeaturesCol("features").setLabelCol("col2").setMaxIter(10)

## Train the model using the training data.
lrm = lr.fit(trainingData)

## Run the test data through the model and display its predictions for PetalLength.
#predictions = lrm.transform(testData)
#predictions.show()
但是在执行
lr.fit(trainingData)
时,我在控制台(以及齐柏林飞艇的日志文件)中发现了错误。启动spark时出现错误:无法运行程序/usr/bin/”:错误=13,Keine Berechtigung。我想知道应该在/usr/bin/中启动什么,因为我只使用路径/usr/local/

Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-4001144784380663394.py", line 367, in <module>
    raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-4001144784380663394.py", line 360, in <module>
    exec(code, _zcUserQueryNameSpace)
  File "<stdin>", line 9, in <module>
  File "/usr/local/spark/python/pyspark/ml/base.py", line 64, in fit
    return self._fit(dataset)
  File "/usr/local/spark/python/pyspark/ml/wrapper.py", line 236, in _fit
    java_model = self._fit_java(dataset)
  File "/usr/local/spark/python/pyspark/ml/wrapper.py", line 233, in _fit_java
    return self._java_obj.fit(dataset._jdf)
  File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/usr/local/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
    format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o70.fit.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): **java.io.IOException: Cannot run program "/usr/bin/": error=13, Keine Berechtigung**
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
    at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:163)
    at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89)
    at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:65)
    at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:116)
    at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
回溯(最近一次呼叫最后一次):
文件“/tmp/zeppelin_pyspark-4001144784380663394.py”,第367行,in
引发异常(traceback.format_exc())
例外情况:回溯(最近一次呼叫最后一次):
文件“/tmp/zeppelin_pyspark-4001144784380663394.py”,第360行,in
exec(代码,ZCUUserQueryNameSpace)
文件“”,第9行,在
文件“/usr/local/spark/python/pyspark/ml/base.py”,第64行,适合
返回自拟合(数据集)
文件“/usr/local/spark/python/pyspark/ml/wrapper.py”,第236行,格式为
java\u model=self.\u fit\u java(数据集)
java文件“/usr/local/spark/python/pyspark/ml/wrapper.py”,第233行
返回self.\u java\u obj.fit(dataset.\u jdf)
文件“/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py”,第1133行,在__
回答,self.gateway\u客户端,self.target\u id,self.name)
文件“/usr/local/spark/python/pyspark/sql/utils.py”,第63行,deco格式
返回f(*a,**kw)
文件“/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py”,第319行,在get_return_值中
格式(目标id,“.”,名称),值)
Py4JJavaError:调用o70.fit时出错。
:org.apache.spark.sparkeexception:作业因阶段失败而中止:阶段0.0中的任务0失败1次,最近的失败:阶段0.0中的任务0.0丢失(TID 0,本地主机,执行器驱动程序):**java.io.IOException:无法运行程序“/usr/bin/”:错误=13,Keine Berechtigung**
位于java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
位于org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:163)
位于org.apache.spark.api.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89)
位于org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:65)
位于org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:116)
位于org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128)
位于org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
在org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:323)上
位于org.apache.spark.rdd.rdd.iterator(rdd.scala:287)

这是齐柏林飞艇
conf/zeppelin env.sh
中的一个配置错误。在那里,我没有注释导致错误的下面一行,现在我对该行进行了注释,它可以工作了:

#export PYSPARK_PYTHON=/usr/bin/   # path to the python command. must be the same path on the driver(Zeppelin) and all workers.
所以问题是PYSPARK_PYTHON的路径设置不正确,现在它使用默认的PYTHON二进制文件。我通过在齐柏林飞艇基地目录中执行
grep-R”/usr/bin/“
查找字符串
/usr/bin/
找到了解决方案,并检查了文件