Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/amazon-web-services/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python pyspark作业失败,aws EMR上出现OutOfMemoryError_Python_Amazon Web Services_Pyspark_Out Of Memory_Amazon Emr - Fatal编程技术网

Python pyspark作业失败,aws EMR上出现OutOfMemoryError

Python pyspark作业失败,aws EMR上出现OutOfMemoryError,python,amazon-web-services,pyspark,out-of-memory,amazon-emr,Python,Amazon Web Services,Pyspark,Out Of Memory,Amazon Emr,我已提交pyspark作业,但经过一段时间后,作业失败,出现以下错误: 20/10/08 06:49:30 ERROR Client: Application diagnostics message: Application application_1602138886042_0001 failed 2 times due to AM Container for appattempt_1602138886042_0001_000002 exited with exitCode: -104 Fa

我已提交pyspark作业,但经过一段时间后,作业失败,出现以下错误:

20/10/08 06:49:30 ERROR Client: Application diagnostics message: Application application_1602138886042_0001 failed 2 times due to AM Container for appattempt_1602138886042_0001_000002 exited with  exitCode: -104
Failing this attempt.Diagnostics: Container [pid=16756,containerID=container_1602138886042_0001_02_000001] is running beyond physical memory limits. Current usage: 1.6 GB of 1.5 GB physical memory used; 4.4 GB of 7.5 GB virtual memory used. Killing container.
Dump of the process-tree for container_1602138886042_0001_02_000001 :
    |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
    |- 16756 16754 16756 16756 (bash) 0 0 115871744 704 /bin/bash -c LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx1024m -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1602138886042_0001/container_1602138886042_0001_02_000001/tmp '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -
为了解决这个内存问题,我尝试更改驱动程序和执行程序的内存设置,但作业仍然失败。下面是spark提交命令:

Args': ['spark-submit',
                         '--deploy-mode', 'cluster',
                         '--master', 'yarn',
                         '--executor-memory',
                         conf['emr_step_executor_memory'],
                         '--executor-cores',
                         conf['emr_step_executor_cores'],
                         
                         '--conf',
                         'spark.yarn.submit.waitAppCompletion=true',
                         '--conf',
                         'spark.rpc.message.maxSize=1024',
                         '--conf',
                         'spark.driver.memoryOverhead=512',
                         '--conf',
                         'spark.executor.memoryOverhead=512',
                         '--conf',
                         'spark.driver.memory =2g',
                         '--conf',
                         'spark.driver.cores=2']
aws上的主机:c4.2xL aws上的堆芯机:c4.4XL

重要的一点是,即使数据小于50MB,也不会太多