Apache spark 如何在ApacheLivy中提交pyspark作业?
如何以apachelivy格式指定上述pyspark提交命令 我尝试了以下方法:Apache spark 如何在ApacheLivy中提交pyspark作业?,apache-spark,hadoop,pyspark,amazon-emr,Apache Spark,Hadoop,Pyspark,Amazon Emr,如何以apachelivy格式指定上述pyspark提交命令 我尝试了以下方法: spark-submit --packages com.databricks:spark-redshift_2.11:2.0.1 --jars /usr/share/aws/redshift/jdbc/RedshiftJDBC4.jar /home/hadoop/test.py 还将获取以下错误: curl -X POST --data '{"file": "/home/hadoop/test.py",
spark-submit --packages com.databricks:spark-redshift_2.11:2.0.1 --jars /usr/share/aws/redshift/jdbc/RedshiftJDBC4.jar /home/hadoop/test.py
还将获取以下错误:
curl -X POST --data '{"file": "/home/hadoop/test.py", "conf":
{"com.databricks": "spark-redshift_2.11:2.0.1"}, \
"queue": "my_queue", "name": "Livy Example", "jars" :
"/usr/share/aws/redshift/jdbc/RedshiftJDBC4.jar"}', \
-H "Content-Type: application/json" localhost:8998/batches
您的命令错误,请使用以下示例构造命令 spark提交命令 Livy REST JSON协议 -包裹。使用此命令时,将处理所有可传递的依赖项 在Livy中,您需要转到解释器设置页面,并在Livy设置下添加新属性- livy.spark.jars.packages 价值呢
{
“className”: “org.apache.spark.examples.SparkPi”,
“jars”: [“a.jar”, “b.jar”],
“pyFiles”: [“a.py”, “b.py”],
“files”: [“foo.txt”, “bar.txt”],
“archives”: [“foo.zip”, “bar.tar”],
“driverMemory”: “10G”,
“driverCores”: 1,
“executorCores”: 3,
“executorMemory”: “20G”,
“numExecutors”: 50,
“queue”: “default”,
“name”: “test”,
“proxyUser”: “foo”,
“conf”: {“spark.jars.packages”: “xxx”},
“file”: “hdfs:///path/to/examples.jar”,
“args”: [1000],
}
重新启动解释器并重试查询。是否剪切和粘贴错误,或者数据中是否有智能引号?查看conf和com.databricks…让我检查一下…仍然是相同的错误。-packages com.databricks:spark-redshift_2.11:2.0.1如何按照spark submit.try spark.jars.packages:com.databricks:spark-redshift_2.11:2.0.1conf:{spark.jars.packages:com.databricks:spark-redshift_2.11:2.0.1},我收到无效的json错误scala错误,我无法粘贴确切的错误,因为Iam不在工作场所。转到解释器设置页面,在livy设置下添加新属性-livy.spark.jars.packages和值com.databricks:spark-redshift_2.11:2.0.1重新启动解释器并重试查询。@vaquarkhan如何提交pyspark job到livy。文件字段将是pyspark文件吗?在上面的代码段中包括“文件”:hdfs:///path/to/examples.jar“.这对pyspark来说应该是什么?
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--jars a.jar,b.jar \
--pyFiles a.py,b.py \
--files foo.txt,bar.txt \
--archives foo.zip,bar.tar \
--master yarn \
--deploy-mode cluster \
--driver-memory 10G \
--driver-cores 1 \
--executor-memory 20G \
--executor-cores 3 \
--num-executors 50 \
--queue default \
--name test \
--proxy-user foo \
--conf spark.jars.packages=xxx \
/path/to/examples.jar \
1000
{
“className”: “org.apache.spark.examples.SparkPi”,
“jars”: [“a.jar”, “b.jar”],
“pyFiles”: [“a.py”, “b.py”],
“files”: [“foo.txt”, “bar.txt”],
“archives”: [“foo.zip”, “bar.tar”],
“driverMemory”: “10G”,
“driverCores”: 1,
“executorCores”: 3,
“executorMemory”: “20G”,
“numExecutors”: 50,
“queue”: “default”,
“name”: “test”,
“proxyUser”: “foo”,
“conf”: {“spark.jars.packages”: “xxx”},
“file”: “hdfs:///path/to/examples.jar”,
“args”: [1000],
}
com.databricks:spark-redshift_2.11:2.0.1