“线程中的异常”;广播-交换-0“;java.lang.OutOfMemoryError:内存不足,无法构建表并将其广播给所有工作节点

“线程中的异常”;广播-交换-0“;java.lang.OutOfMemoryError:内存不足,无法构建表并将其广播给所有工作节点,java,apache-spark,apache-spark-sql,apache-spark-2.0,Java,Apache Spark,Apache Spark Sql,Apache Spark 2.0,我正在以下配置上运行spark应用程序: 1个主节点,2个工作节点。 每个工人有88个核心,因此核心总数为176个 每个辅助进程都有502 GB内存,因此总可用内存为1004 GB 运行应用程序时出现以下异常: Exception in thread "broadcast-exchange-0" java.lang.OutOfMemoryError: Not enough memory to build and broadcast the table to all worker nodes.

我正在以下配置上运行spark应用程序:

1个主节点,2个工作节点。

  • 每个工人有88个核心,因此核心总数为176个

  • 每个辅助进程都有502 GB内存,因此总可用内存为1004 GB

运行应用程序时出现以下异常:

Exception in thread "broadcast-exchange-0" java.lang.OutOfMemoryError: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory to a higher value
        at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:115)
        at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:73)
        at org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:97)
        at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:72)
        at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:72)
        at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
        at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
此错误本身中提到了两种解决方案:

  • 作为一种解决方法,您可以通过设置 spark.sql.autoBroadcastJoinThreshold为-1

  • 通过将spark.driver.memory设置为a来增加spark驱动程序内存 更高的价值

  • 我试图在运行时设置更多的驱动程序内存,但是我想了解这个问题的根本原因。谁能解释一下吗

    我在代码中使用了Java

    编辑1

    我正在代码中使用广播变量

    编辑2

    添加包含广播变量的代码

    //1.
            Dataset<Row> currencySet1 = sparkSession.read().format("jdbc").option("url",connection ).option("dbtable", CI_CURRENCY_CD).load();
            currencySetCache = currencySet1.select(CURRENCY_CD, DECIMAL_POSITIONS).persist(StorageLevel.MEMORY_ONLY());
            Dataset<Row> currencyCodes = currencySetCache.select(CURRENCY_CD);
            currencySet = currencyCodes.as(Encoders.STRING()).collectAsList();
    
            //2.
            Dataset<Row>  divisionSet = sparkSession.read().format("jdbc").option("url",connection ).option("dbtable", CI_CIS_DIVISION).load();
            divisionSetCache = divisionSet.select(CIS_DIVISION).persist(StorageLevel.MEMORY_ONLY());
            divisionList = divisionSetCache.as(Encoders.STRING()).collectAsList();
    
            //3.
            Dataset<Row> userIdSet =  sparkSession.read().format("jdbc").option("url",connection ).option("dbtable", SC_USER).load();
            userIdSetCache = userIdSet.select(USER_ID).persist(StorageLevel.MEMORY_ONLY());
            userIdList = userIdSetCache.as(Encoders.STRING()).collectAsList();
    
    ClassTag<List<String>> evidenceForDivision = scala.reflect.ClassTag$.MODULE$.apply(List.class);
            Broadcast<List<String>> broadcastVarForDiv = context.broadcast(divisionList, evidenceForDivision);
    
            ClassTag<List<String>> evidenceForCurrency = scala.reflect.ClassTag$.MODULE$.apply(List.class);
            Broadcast<List<String>> broadcastVarForCurrency = context.broadcast(currencySet, evidenceForCurrency);
    
            ClassTag<List<String>> evidenceForUserID = scala.reflect.ClassTag$.MODULE$.apply(List.class);
            Broadcast<List<String>> broadcastVarForUserID = context.broadcast(userIdList, evidenceForUserID);
    
    
            //Validation -- Start
            Encoder<RuleParamsBean> encoder = Encoders.bean(RuleParamsBean.class);
            Dataset<RuleParamsBean> ds = new Dataset<RuleParamsBean>(sparkSession, finalJoined.logicalPlan(), encoder);
    
    
            Dataset<RuleParamsBean> validateDataset = ds.map(ruleParamsBean -> validateTransaction(ruleParamsBean,broadcastVarForDiv.value(),broadcastVarForCurrency.value(),
                    broadcastVarForUserID.value()),encoder);
            validateDataset.persist(StorageLevel.MEMORY_ONLY());
    
    //1。
    Dataset currencySet1=sparkSession.read().format(“jdbc”).option(“url”,connection).option(“dbtable”,CI_CURRENCY_CD).load();
    currencySetCache=currencySet1.select(货币、十进制位置)。persist(仅限StorageLevel.MEMORY());
    数据集currencyCodes=currencySetCache.select(货币\u CD);
    currencySet=currencyCodes.as(Encoders.STRING()).collectAsList();
    //2.
    Dataset DIVISION set=sparkSession.read().format(“jdbc”).option(“url”,connection).option(“dbtable”,CI_CIS_DIVISION).load();
    divisionSetCache=divisionSet.select(CIS_DIVISION).persist(仅限StorageLevel.MEMORY_());
    divisionList=divisionSetCache.as(Encoders.STRING()).collectAsList();
    //3.
    数据集userIdSet=sparkSession.read().format(“jdbc”).option(“url”,connection).option(“dbtable”,SC_USER).load();
    userIdSetCache=userIdSet.select(USER_ID).persist(StorageLevel.MEMORY_ONLY());
    userIdList=userIdSetCache.as(Encoders.STRING()).collectAsList();
    ClassTag-evidenceForDivision=scala.reflect.ClassTag$.MODULE$.apply(List.class);
    Broadcast broadcastVarForDiv=context.Broadcast(部门列表、证据查询);
    ClassTag-EvidenceRecurrency=scala.reflect.ClassTag$.MODULE$.apply(List.class);
    Broadcast broadcastVarForCurrency=context.Broadcast(currencySet,证据循环);
    ClassTag-evidenceForUserID=scala.reflect.ClassTag$.MODULE$.apply(List.class);
    Broadcast broadcastVarForUserID=context.Broadcast(userIdList,confidenceForUserId);
    //验证--启动
    编码器=Encoders.bean(RuleParamsBean.class);
    数据集ds=新数据集(sparkSession,finalJoined.logicalPlan(),编码器);
    Dataset validateDataset=ds.map(ruleParamsBean->validateTransaction(ruleParamsBean,broadcastVarForDiv.value(),broadcastVarForCurrency.value(),
    broadcastVarForUserID.value()),编码器);
    validateDataset.persist(仅限StorageLevel.MEMORY_());
    
    可能的根本原因:默认值“spark.driver.memory”仅为1GB(取决于分配),这是一个非常小的数字。若您正在读取驱动程序上的大量数据,OutOfMemory很容易发生,异常的建议是正确的


    解决方案:将“spark.driver.memory”和“spark.executor.memory”至少增加到16Gb。

    您的广播变量包含什么以及它消耗了多少内存?@tauitdnmd我添加了一些代码作为参考,这些代码描述了广播变量,基本上每个变量都包含表中一列的值,有3个广播变量。spark.driver.memory、spark.executor.memory的配置值是什么?@pasha701 for spark.driver.memory的配置值是默认值我没有显式设置,并且--executor memory=6G和--num executors=22。是的,我将两个参数都增加了26GB,错误得到解决。谢谢,我正在努力更好地理解它。您说:“如果您正在读取驱动程序上的大量数据,可能会发生OutOfMemory”,但文档中说:“我的数据是否需要存储在内存中才能使用Spark?不可以。Spark的操作人员会在数据不适合内存的情况下将数据溢出到磁盘,从而使其能够在任何大小的数据上正常运行。同样,不适合内存的缓存数据集要么溢出到磁盘,要么根据RDD的存储级别在需要时动态重新计算。“那么,为什么我必须增加我的驱动程序和执行程序内存?我这里缺少一些东西……我猜,你的引用与广播变量无关。”。