Sql 配置单元:需要指定分区列,因为目标表已分区
我想知道在配置单元中是否可以将未分区的表插入已分区的表中。第一个表如下:Sql 配置单元:需要指定分区列,因为目标表已分区,sql,hadoop,hive,Sql,Hadoop,Hive,我想知道在配置单元中是否可以将未分区的表插入已分区的表中。第一个表如下: hive> describe extended user_ratings; OK userid int movieid int rating int
hive> describe extended user_ratings;
OK
userid int
movieid int
rating int
unixtime int
Detailed Table Information Table(tableName:user_ratings, dbName:ml, owner:cloudera, createTime:1500142667, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:userid, type:int, comment:null), FieldSchema(name:movieid, type:int, comment:null), FieldSchema(name:rating, type:int, comment:null), FieldSchema(name:unixtime, type:int, comment:null)], location:hdfs://quickstart.cloudera:8020/user/hive/warehouse/ml.db/user_ratings, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format= , field.delim=
Time taken: 0.418 seconds, Fetched: 6 row(s)
因此,新的表格是:
hive> describe extended rating_buckets;
OK
userid int
movieid int
rating int
unixtime int
genre string
# Partition Information
# col_name data_type comment
genre string
Detailed Table Information Table(tableName:rating_buckets, dbName:default, owner:cloudera, createTime:1500506879, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:userid, type:int, comment:null), FieldSchema(name:movieid, type:int, comment:null), FieldSchema(name:rating, type:int, comment:null), FieldSchema(name:unixtime, type:int, comment:null), FieldSchema(name:genre, type:string, comment:null)], location:hdfs://quickstart.cloudera:8020/user/hive/warehouse/rating_buckets, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:8, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format= , field.delim=
Time taken: 0.46 seconds, Fetched: 12 row(s)
它似乎将分区(“类型”)计算为与其他列相同……我是否可能创建了错误的分区
无论如何,当我尝试在新表中执行插入覆盖时,会发生以下情况:
hive> FROM ml.user_ratings
> INSERT OVERWRITE TABLE rating_buckets
> select userid, movieid, rating, unixtime;
FAILED: SemanticException 2:23 Need to specify partition columns because the destination table is partitioned. Error encountered near token 'rating_buckets'
我应该用分区重新创建第一个表吗?有没有办法复制第一个表并保持分区不变?您甚至没有在选择列表中包含流派。我认为它应该是你选择的最后一个。你不能白白分割 您还需要指定表的分区,如下所示:
insert overwrite table ratings_buckets partition(genre)
select
userid,
movieid,
rating,
unixtime,
<SOMETHING> as genre
from
...
insert overwrite table ratings\u bucket分区(类型)
选择
用户ID,
电影导演,
评级
unixtime,
作为体裁
从…起
...
您甚至没有在选择列表中包含流派。我认为它应该是你选择的最后一个。你不能白白分割
您还需要指定表的分区,如下所示:
insert overwrite table ratings_buckets partition(genre)
select
userid,
movieid,
rating,
unixtime,
<SOMETHING> as genre
from
...
insert overwrite table ratings\u bucket分区(类型)
选择
用户ID,
电影导演,
评级
unixtime,
作为体裁
从…起
...
我感谢您的输入,但不幸的是,它返回以下内容:配置单元>插入覆盖表分级分区(流派)>选择>用户ID,>电影ID,>分级,>unixtime,>作为流派>从ml.user\u分级;失败:SemanticException[错误10004]:第7:1行无效的表别名或列引用“action”:(可能的列名为:userid、movieid、rating、unixtime)是否尝试将单词action作为您的类型插入?如果是这样的话,您需要用单引号将其括起来,而不是括号:“action”作为流派
。我感谢您的输入,但不幸的是,它返回以下内容:配置单元>插入覆盖表评级\u bucket分区(流派)>选择>用户ID,>movieid,>rating,>unixtime,>(动作)作为类型>来自ml.user\u评级;失败:SemanticException[错误10004]:第7:1行无效的表别名或列引用“action”:(可能的列名为:userid、movieid、rating、unixtime)是否尝试将单词action作为您的类型插入?如果是这样,你需要用单引号括起来,而不是用括号括起来:“动作”作为体裁。