Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 无法在Cloudera VM上运行PySpark(不使用交互式shell)_Apache Spark_Pyspark_Cloudera Quickstart Vm - Fatal编程技术网

Apache spark 无法在Cloudera VM上运行PySpark(不使用交互式shell)

Apache spark 无法在Cloudera VM上运行PySpark(不使用交互式shell),apache-spark,pyspark,cloudera-quickstart-vm,Apache Spark,Pyspark,Cloudera Quickstart Vm,当我在cloudera vm环境中遵循并尝试使用命令spark submit时,我经常会遇到以下错误: ERROR spark.SparkContext: Error initializing SparkContext. org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=WRITE, inode="/user/spark/applicationHistory":s

当我在cloudera vm环境中遵循并尝试使用命令
spark submit
时,我经常会遇到以下错误:

ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=WRITE, inode="/user/spark/applicationHistory":spark:supergroup:drwxr-xr-x
....
Traceback (most recent call last):
File "/home/cloudera/wordcount.py", line 9, in <module>
sc = SparkContext(conf=conf)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/context.py", line 115, in __init__
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/context.py", line 172, in _do_init
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/context.py", line 235, in _initialize_context
File "/usr/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 1064, in __call__
File "/usr/lib/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=WRITE, inode="/user/spark/applicationHistory":spark:supergroup:drwxr-xr-x
错误spark.SparkContext:初始化SparkContext时出错。
org.apache.hadoop.security.AccessControlException:权限被拒绝:user=cloudera,access=WRITE,inode=“/user/spark/applicationHistory”:spark:supergroup:drwxr-xr-x
....
回溯(最近一次呼叫最后一次):
文件“/home/cloudera/wordcount.py”,第9行,在
sc=SparkContext(conf=conf)
文件“/usr/lib/spark/python/lib/pyspark.zip/pyspark/context.py”,第115行,在__
文件“/usr/lib/spark/python/lib/pyspark.zip/pyspark/context.py”,第172行,在
文件“/usr/lib/spark/python/lib/pyspark.zip/pyspark/context.py”,第235行,在上下文中
文件“/usr/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py”,第1064行,在调用中__
文件“/usr/lib/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py”,第308行,在get_return_值中
py4j.protocol.Py4JJavaError:调用None.org.apache.spark.api.java.JavaSparkContext时出错。
:org.apache.hadoop.security.AccessControlException:权限被拒绝:user=cloudera,access=WRITE,inode=“/user/spark/applicationHistory”:spark:supergroup:drwxr-xr-x
我试过这两个命令:

1,
$spark submit--master thread--deploy模式客户端--executor memory 1g \--name wordcount--conf“spark.app.id=wordcount”wordcount.pyhdfs://namenode_host:8020/path/to/inputfile.txt

2,
$spark submit--master warn--deploy模式客户端--executor memory 1g \--name wordcount--conf“spark.app.id=wordcount”wordcount.py inputfile.txt


有人能帮忙吗?

尝试使用以下环境变量运行:

HADOOP_USER_NAME=hdfs spark-submit <your command>
HADOOP\u USER\u NAME=hdfs spark submit
看起来您需要运行
chmod
chown
来为您的用户提供一些权限正如@cricket\u 007所提到的,这是一个权限问题。Spark的
applicationHistory
似乎没有足够的权限。您可以尝试提供这样的权限-
sudo-u spark-hadoop fs-chmod 777/user/spark/applicationHistory