Java Apache Spark和Spark JobServer在几个小时后崩溃
我正在使用ApacheSpark2.0.2和ApacheJobserver0.7.0 我知道这不是最佳做法,但这是第一步。我的服务器有52 Gb RAM和6个CPU内核,Cent OS 7 x64,Java(TM)SE运行时环境(build 1.7.0_79-b15),并且它有以下使用指定内存配置运行的应用程序Java Apache Spark和Spark JobServer在几个小时后崩溃,java,mysql,apache-spark,spark-jobserver,Java,Mysql,Apache Spark,Spark Jobserver,我正在使用ApacheSpark2.0.2和ApacheJobserver0.7.0 我知道这不是最佳做法,但这是第一步。我的服务器有52 Gb RAM和6个CPU内核,Cent OS 7 x64,Java(TM)SE运行时环境(build 1.7.0_79-b15),并且它有以下使用指定内存配置运行的应用程序 JBossAS7(6GB) PDI Pentaho 6.0(12 Gb) MySQL(20GB) Apache Spark 2.0.2(8GB) 我启动它,一切都按预期进行。工作了好
- JBossAS7(6GB)
- PDI Pentaho 6.0(12 Gb)
- MySQL(20GB)
- Apache Spark 2.0.2(8GB)
public class VIQ_SparkJob extends JavaSparkJob {
protected SparkSession sparkSession;
protected String TENANT_ID;
@Override
public Object runJob(SparkContext jsc, Config jobConfig) {
sparkSession = SparkSession.builder()
.sparkContext(ctx)
.enableHiveSupport()
.config("spark.sql.warehouse.dir", "file:///value_iq/spark-warehouse/")
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.config("spark.kryoserializer.buffer", "8m")
.getOrCreate();
Class<?>[] classes = new Class<?>[2];
classes[0] = UsersCube.class;
classes[1] = ImportCSVFiles.class;
sparkSession.sparkContext().conf().registerKryoClasses(classes);
TENANT_ID = jobConfig.getString("tenant_id");//parameters.getString("tenant_id");
return true;
}
@Override
public SparkJobValidation validate(SparkContext sc, Config config) {
return SparkJobValid$.MODULE$; //To change body of generated methods, choose Tools | Templates.
}
}
我有主服务器和一个工作服务器。我的spark-defaults.conf
spark.debug.maxToStringFields 256
spark.shuffle.service.enabled true
spark.shuffle.file.buffer 64k
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.minExecutors 1
spark.dynamicAllocation.maxExecutors 5
spark.rdd.compress true
这是我的Spark Jobserver settings.sh
DEPLOY_HOSTS="myserver.com"
APP_USER=root
APP_GROUP=root
JMX_PORT=5051
INSTALL_DIR=/bin/spark/job-server-master
LOG_DIR=/var/log/job-server-master
PIDFILE=spark-jobserver.pid
JOBSERVER_MEMORY=2G
SPARK_VERSION=2.0.2
MAX_DIRECT_MEMORY=2G
SPARK_CONF_DIR=$SPARK_HOME/conf
SCALA_VERSION=2.11.8
我使用以下方法创建上下文:
curl-k--basic--user'user:password'-d''
火花驱动器使用2Gb
创建的应用程序如下所示
ExecutorID Worker Cores Memory State Logs
0 worker-20170203084218-157.97.107.42-50199 5 8192 RUNNING stdout stderr
那些是我的遗嘱执行人
Executor ID Address ▴ Status RDD Blocks Storage Memory Disk Used Cores
driver 157.97.107.42:55222 Active 0 0.0 B / 1018.9 MB 0.0 B 0
0 157.97.107.42:55223 Active 0 0.0 B / 4.1 GB 0.0 B 5
我有一个进程,它检查每个进程使用的内存,最大的内存量是8468MB
有4个与火花相关的过程
- 主进程。从分配1Gb内存开始,我不知道这个配置来自何处。但似乎已经足够了。顶部仅使用0.4 Gb
- 工人进程。与主机相同的内存使用
- 驱动程序进程。谁配置了2Gb
- 上下文。谁配置了8Gb
system_user | RAM(Mb) | entry_date
--------------+----------+---------------------
spark.driver 2472.11 2017-02-07 10:10:18 //Till here everything was fine
spark.context 5470.19 2017-02-07 10:10:18 //it was running for more thant 48 hours
spark.driver 2472.11 2017-02-07 10:11:18 //Then I execute three big concurrent queries
spark.context 0.00 2017-02-07 10:11:18 //and I get java.lang.OutOfMemoryError: Java heap space
//in $LOG_FOLDER/job-server-master/server_startup.log
# I've check and the context was still present in the jobserver but unresponding.
#in spark the application was killed
spark.driver 2472.11 2017-02-07 10:16:18 //Here I have deleted and created again
spark.context 105.20 2017-02-07 10:16:18
spark.driver 2577.30 2017-02-07 10:19:18 //Here I execute the three big
spark.context 3734.46 2017-02-07 10:19:18 //concurrent queries again.
spark.driver 2577.30 2017-02-07 10:20:18 //Here after the queries where
spark.context 5154.60 2017-02-07 10:20:18 //executed. No memory issue.
我有两个问题:
1-为什么当我检查spark GUI时,配置了2 Gb的驱动程序只使用1,执行器0也一样,只使用4.4 Gb。另一个配置的内存在哪里?但当系统中的进程被驱动时,它使用2Gb
2-如果服务器上有足够的内存,那么为什么内存不足?当作业未运行时,能否检查saprk UI并计算实际可用内存量?您说过为Spark分配了8GB,但您应该为Spark本身保留一些内存。另外,您还有很多其他进程,但您正在为executor定位5个内核。我已经监控了系统,所有应用程序使用的最高内存约为42 Gb,因此有大约10 Gb的可用内存。我已经检查了Spark UI,worker在那里,但应用程序的状态为“已终止”,因此我无法查看内存状态。您是否看到任何缺少的配置?还是一种更好的方法来计算当前的记忆?关于记忆的一个疑问。如果我给jobserver 2Gb内存,那将是驱动程序内存,对吗?当我开始一个上下文时,比如说8Gb,8Gb对于执行者来说是6Gb,对于驱动者来说是2Gb?因为我正在监视每个spark进程,但我仍然不知道内存分配是如何工作的。在问题的末尾,我添加了驱动程序和上下文内存在崩溃前后的行为。
ExecutorID Worker Cores Memory State Logs
0 worker-20170203084218-157.97.107.42-50199 5 8192 RUNNING stdout stderr
Executor ID Address ▴ Status RDD Blocks Storage Memory Disk Used Cores
driver 157.97.107.42:55222 Active 0 0.0 B / 1018.9 MB 0.0 B 0
0 157.97.107.42:55223 Active 0 0.0 B / 4.1 GB 0.0 B 5
system_user | RAM(Mb) | entry_date
--------------+----------+---------------------
spark.driver 2472.11 2017-02-07 10:10:18 //Till here everything was fine
spark.context 5470.19 2017-02-07 10:10:18 //it was running for more thant 48 hours
spark.driver 2472.11 2017-02-07 10:11:18 //Then I execute three big concurrent queries
spark.context 0.00 2017-02-07 10:11:18 //and I get java.lang.OutOfMemoryError: Java heap space
//in $LOG_FOLDER/job-server-master/server_startup.log
# I've check and the context was still present in the jobserver but unresponding.
#in spark the application was killed
spark.driver 2472.11 2017-02-07 10:16:18 //Here I have deleted and created again
spark.context 105.20 2017-02-07 10:16:18
spark.driver 2577.30 2017-02-07 10:19:18 //Here I execute the three big
spark.context 3734.46 2017-02-07 10:19:18 //concurrent queries again.
spark.driver 2577.30 2017-02-07 10:20:18 //Here after the queries where
spark.context 5154.60 2017-02-07 10:20:18 //executed. No memory issue.