Hive 配置单元：GC开销或堆空间错误-动态分区表_Hive_Out Of Memory_Reduce_Memory Efficient_Hadoop Partitioning

Hive 配置单元：GC开销或堆空间错误-动态分区表

hive

Hive 配置单元：GC开销或堆空间错误-动态分区表,hive,out-of-memory,reduce,memory-efficient,hadoop-partitioning,Hive,Out Of Memory,Reduce,Memory Efficient,Hadoop Partitioning,您能指导我解决这个GC开销和堆空间错误吗我正在尝试使用以下查询从另一个表（动态分区）插入分区表： INSERT OVERWRITE table tbl_part PARTITION(county) SELECT col1, col2.... col47, county FROM tbl; 我已运行以下参数： export HADOOP_CLIENT_OPTS=" -Xmx2048m" set hive.exec.dynamic.partition=true; set hive.exe

您能指导我解决这个GC开销和堆空间错误吗

我正在尝试使用以下查询从另一个表（动态分区）插入分区表：

INSERT OVERWRITE table tbl_part PARTITION(county)
SELECT  col1, col2.... col47, county FROM tbl;

我已运行以下参数：

export  HADOOP_CLIENT_OPTS=" -Xmx2048m"
set hive.exec.dynamic.partition=true;  
set hive.exec.dynamic.partition.mode=nonstrict; 
SET hive.exec.max.dynamic.partitions=2048;
SET hive.exec.max.dynamic.partitions.pernode=256;
set mapreduce.map.memory.mb=2048;
set yarn.scheduler.minimum-allocation-mb=2048;
set hive.exec.max.created.files=250000;
set hive.vectorized.execution.enabled=true;
set hive.merge.smallfiles.avgsize=283115520;
set hive.merge.size.per.task=209715200;

在warn-site.xml中还添加了：

<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for    containers</description>
</property>

<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
</property>

它是一个具有1个核心的独立集群。准备测试数据以在spark中运行我的单元测试用例

你能告诉我我还能做什么吗

源表具有以下属性：

Table Parameters:       
    COLUMN_STATS_ACCURATE   true                
    numFiles                13                  
    numRows                 10509065            
    rawDataSize             3718599422          
    totalSize               3729108487          
    transient_lastDdlTime   1470909228

谢谢。

添加

按县分发
对于您的查询：
INSERT OVERWRITE table tbl_part PARTITION(county) SELECT  col1, col2.... col47, county FROM tbl DISTRIBUTE BY county;

我使用DISTRIBUTE BY运行时出现了堆空间错误：您没有提供日志，所以我使用的是有根据的猜测。通常这会有帮助。您是否尝试增加分配的内存？也许它真的没有内存了。请参见：和：set hive.vectoriazed.execution.enabled=true；设置hive.vectoried.execution.reduce.enabled=true；设置hive.vectoriazed.execution.reduce.groupby.enabled=true；设置warn.nodemanager.resource.memory mb=8192；设置纱线.调度程序.最小分配mb=2048；设置纱线.调度程序.最大分配mb=8192；设置hive.tez.container.size=7168；设置hive.tez.java.opts=-Xmx4096m；RL:----此任务的诊断消息：错误：Java堆空间失败：执行错误，org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce作业启动返回代码2:Stage-Stage-1:Map:13 Reduce:15累计CPU:1212.26秒HDFS读取：3729681833 HDFS写入：408552289失败MapReduce CPU总时间：20分钟12秒260毫秒我应该增加“hive.tez.java.opts”。可用内存仅为12GB：缓存的可用共享缓冲区总数：15347 8041 7306 0 179 5213-/+缓冲区/缓存：2648 12698交换：15670 15 15655请建议。
INSERT OVERWRITE table tbl_part PARTITION(county) SELECT  col1, col2.... col47, county FROM tbl DISTRIBUTE BY county;