Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/delphi/9.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hive 配置单元-外部(动态)分区表_Hive_Hiveql - Fatal编程技术网

Hive 配置单元-外部(动态)分区表

Hive 配置单元-外部(动态)分区表,hive,hiveql,Hive,Hiveql,我在MySQL viz中有一个表。nas_公司 select comp_code, count(leg_id) from nas_comps_01012011_31012011 n group by comp_code; comp_code count(leg_id) 'J' 20640 'Y' 39680 首先,我使用Sqoop将数据导入HDFSHadoop(1.0.2版): sqoop import --connect jdbc:mysql:

我在MySQL viz中有一个表。nas_公司

select comp_code, count(leg_id) from nas_comps_01012011_31012011 n group by comp_code;
comp_code     count(leg_id)
'J'           20640
'Y'           39680
首先,我使用Sqoop将数据导入HDFSHadoop(1.0.2版):

sqoop import --connect jdbc:mysql://172.25.37.135/pros_olap2 \
--username hadoopranch \
--password hadoopranch \
--query "select * from nas_comps where dep_date between '2011-01-01' and '2011-01-10' AND \$CONDITIONS" \
-m 1 \
--target-dir /pros/olap2/dataimports/nas_comps
然后,我创建了一个外部分区配置单元表:

/*shows the partitions on 'describe' but not 'show partitions'*/
create external table  nas_comps(DS_NAME string,DEP_DATE string,
                                 CRR_CODE string,FLIGHT_NO string,ORGN string,
                                 DSTN string,PHYSICAL_CAP int,ADJUSTED_CAP int,
                                 CLOSED_CAP int)
PARTITIONED BY (LEG_ID int, month INT, COMP_CODE string)
location '/pros/olap2/dataimports/nas_comps'
描述时会显示分区列:

hive> describe extended nas_comps;
OK
ds_name string
dep_date        string
crr_code        string
flight_no       string
orgn    string
dstn    string
physical_cap    int
adjusted_cap    int
closed_cap      int
leg_id  int
month   int
comp_code       string

Detailed Table Information      Table(tableName:nas_comps, dbName:pros_olap2_optim, 
owner:hadoopranch, createTime:1374849456, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:ds_name, type:string, comment:null), 
FieldSchema(name:dep_date, type:string, comment:null), FieldSchema(name:crr_code, 
type:string, comment:null), FieldSchema(name:flight_no, type:string, comment:null), 
FieldSchema(name:orgn, type:string, comment:null), FieldSchema(name:dstn, type:string, 
comment:null), FieldSchema(name:physical_cap, type:int, comment:null), 
FieldSchema(name:adjusted_cap, type:int, comment:null), FieldSchema(name:closed_cap, 
type:int, comment:null), FieldSchema(name:leg_id, type:int, comment:null), 
FieldSchema(name:month, type:int, comment:null), FieldSchema(name:comp_code, type:string, 
comment:null)], location:hdfs://172.25.37.21:54300/pros/olap2/dataimports/nas_comps, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, 
numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:
{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:
[FieldSchema(name:leg_id, type:int, comment:null), FieldSchema(name:month, type:int,
comment:null), FieldSchema(name:comp_code, type:string, comment:null)], 
parameters:{EXTERNAL=TRUE, transient_lastDdlTime=1374849456}, viewOriginalText:null, 
viewExpandedText:null, tableType:EXTERNAL_TABLE)
但我不确定是否创建了分区,因为:

hive> show partitions nas_comps;
OK
Time taken: 0.599 seconds


select count(1) from nas_comps;
返回0条记录


如何使用动态分区创建外部配置单元表?

动态分区

在将记录插入配置单元表期间动态添加分区

  • 仅支持insert语句
  • 加载数据
    语句不支持
  • 在将数据插入配置单元表之前,需要启用动态分区设置。
    hive.exec.dynamic.partition.mode=nonstrict
    默认值为
    strict
    hive.exec.dynamic.partition=true
    默认值为
    false
  • 动态分区查询

    SET hive.exec.dynamic.partition.mode=nonstrict;
    SET hive.exec.dynamic.partition=true;
    INSERT INTO table_name PARTITION (loaded_date)
    select * from table_name1 where loaded_date = 20151217
    
    这里
    loaded\u date=20151217
    是分区及其值

    限制:

    hive> show partitions nas_comps;
    OK
    Time taken: 0.599 seconds
    
    
    select count(1) from nas_comps;
    
  • 动态分区仅适用于上述语句
  • 它将根据从
    表\u name 1
    加载日期
    列中选择的数据动态创建分区 如果您的情况不符合上述标准,则:

    首先创建分区表,然后执行以下操作:

    ALTER TABLE table_name ADD PARTITION (DS_NAME='partname1',DATE='partname2'); 
    

    或者请将其用于动态分区创建。

    Hive不会以这种方式为您创建分区。
    只需创建一个按所需分区键分区的表,然后从外部表到新分区表执行
    insert overwrite table
    (设置
    hive.exec.dynamic.partition=true
    hive.exec.dynamic.partition.mode=nonstrict

    如果必须在外部对表进行分区,则必须手动创建目录(每个分区1个目录,名称应为
    PARTION\u KEY=VALUE

    然后使用
    MSCK修复表\u名称

    是的,我已经检查过了,但这些不是动态分区-仍然需要为分区提供值。对,通过shell脚本运行它。您可以在shell脚本中为分区创建一个变量,并在alter table命令中传递它,否则当前没有可用的选项:(