Java r、 实例集(“spark.executor.cores”,“4”)\.set(“spark.executor.memory”,“5120M”)\.set(“spark.driver.memory”,“5120M”)\.set(“spark.Thread.

Java r、 实例集(“spark.executor.cores”,“4”)\.set(“spark.executor.memory”,“5120M”)\.set(“spark.driver.memory”,“5120M”)\.set(“spark.Thread.,java,performance,apache-spark,apache-spark-sql,Java,Performance,Apache Spark,Apache Spark Sql,r、 实例集(“spark.executor.cores”,“4”)\.set(“spark.executor.memory”,“5120M”)\.set(“spark.driver.memory”,“5120M”)\.set(“spark.Thread.memoryOverhead”,“10000M”)\.set(“spark.Thread.driver.memoryOverhead”,“10000M”)在这里我的SparkConf对象:SparkConf conf=new SparkConf


r、 实例集(“spark.executor.cores”,“4”)\.set(“spark.executor.memory”,“5120M”)\.set(“spark.driver.memory”,“5120M”)\.set(“spark.Thread.memoryOverhead”,“10000M”)\.set(“spark.Thread.driver.memoryOverhead”,“10000M”)在这里我的SparkConf对象:SparkConf conf=new SparkConf().setAppName(“org.spark.SchemaTransformer”).setMaster(“纱线簇”);执行环境:spark.master=warn executor.cores=4 executor.memory=5120我想您遗漏了一些参数。请尝试添加以下方法。例如:SparkConf().set(“master”、“纱线”)\.set(“spark.submit.deployMode”、“cluster”).set(“spark.executor.instances”8“\.set(“spark.executor.cores”,“4”)\.set(“spark.executor.memory”,“5120M”)\.set(“spark.driver.memory”,“5120M”)\.set(“spark.Thread.memoryOverhead”,“10000M”)\.set(“spark.Thread.driver.memoryOverhead”,“10000M”)
 DataFrame a = sqlContext.read().format("com.databricks.spark.csv").options(options)
                .load("s3://s3bucket/a/part*");
 DataFrame b = sqlContext.read().format("com.databricks.spark.csv").options(options)
                .load("s3://s3bucket/b/part*");

a.registerTempTable("a");
b.registerTempTable("b");

DataFrame c = sqlContext.sql("SELECT  a.name, b.name from   a join b on  a.id = b.a_id");

c.write().mode(SaveMode.Append).jdbc(MYSQL_CONNECTION_URL, "c", prop);

// other jobs are similar 

Map<String, String> dOptions = new HashMap<String, String>();
dOptions.put("driver", MYSQL_DRIVER);
dOptions.put("url", MYSQL_CONNECTION_URL);

dOptions.put("dbtable", "(select * from c) AS c");
rC= sqlContext.read().format("jdbc").options(dOptions).load();
rC.cache();

 dOptions.put("dbtable", "(select * from d) AS d");
 rD= sqlContext.read().format("jdbc").options(dOptions).load();
 rD.cache();

 dOptions.put("dbtable", "(select * from f) AS f");
 rF= sqlContext.read().format("jdbc").options(dOptions).load();
 rF.cache();

 rC.registerTempTable("rC");
 rD.registerTempTable("rD");
 rF.registerTempTable("rF");

DataFrame result = sqlContext.sql("SELECT  rC.name, rD.name, rF.date  from rC join rD on rC.name = rD.name join rF on rC.date = rF.date");

result.write().mode(SaveMode.Append).jdbc(MYSQL_CONNECTION_URL, "result_table", prop);