Hive 如何将包含时间字符串值的csv文件加载到配置单元中的时间戳
我有一个以下格式的数据集Hive 如何将包含时间字符串值的csv文件加载到配置单元中的时间戳,hive,hiveql,Hive,Hiveql,我有一个以下格式的数据集 2019-10-01 00:00:00 UTC,cart,5773203,1487580005134238553,,runail,2.62,463240011,26dd6e6e-4dac-4778-8d2c-92e149dab885 2019-10-01 00:00:03 UTC,cart,5773353,1487580005134238553,,runail,2.62,463240011,26dd6e6e-4dac-4778-8d2c-92e149dab885 201
2019-10-01 00:00:00 UTC,cart,5773203,1487580005134238553,,runail,2.62,463240011,26dd6e6e-4dac-4778-8d2c-92e149dab885
2019-10-01 00:00:03 UTC,cart,5773353,1487580005134238553,,runail,2.62,463240011,26dd6e6e-4dac-4778-8d2c-92e149dab885
2019-10-01 00:00:07 UTC,cart,5881589,2151191071051219817,,lovely,13.48,429681830,49e8d843-adf3-428b-a2c3-fe8bc6a307c9
2019-10-01 00:00:07 UTC,cart,5723490,1487580005134238553,,runail,2.62,463240011,26dd6e6e-4dac-4778-8d2c-92e149dab885
我已经创建了一个表来将数据加载到表中
create table if not exists product_data (event_time string,event_type string,product_id string,category_id string,category_code string,brand string,price float,user_id bigint,user_session string) row format delimited fields terminated by ',' lines terminated by '\n' tblproperties("skip.header.line.count"="1");
是否可以直接将事件\ U时间字段作为时间戳值加载?
对于配置单元的新手和任何帮助都将不胜感激因为您正在使用HDFS中的原始文件将数据加载到配置单元中,建议的方法是首先创建外部表,将所有字段作为字符串数据类型。获得外部表后,将数据加载到定义了模式的具体化表中。这两步方法将有助于确保在从文件加载期间不会丢失信息 步骤1:创建外部表:
create external table if not exists product_data_external_table
(
event_time string,
event_type string,
product_id string,
category_id string,
category_code string,
brand string,
price string,
user_id string,
user_session string
) row format delimited
fields terminated by ','
lines terminated by '\n'
location '<your hdfs file location>'
tblproperties("skip.header.line.count"="1");
步骤2:从product_data_external_表将记录插入product_数据:
insert into product_data
select
cast(from_unixtime(unix_timestamp(event_time,'yyyy-MM-dd HH:mm:ss Z'),'yyyy-MM-dd HH:mm:ss') as timestamp) as event_time,
event_type,
product_id,
category_id,
category_code,
brand,
cast(price as float) as price,
cast(user_id as bigint) as user_id,
user_session
from
product_data_external_table;
从Hive 1.2.0开始,可以提供附加的SerDe属性timestamp.formats
insert into product_data
select
cast(from_unixtime(unix_timestamp(event_time,'yyyy-MM-dd HH:mm:ss Z'),'yyyy-MM-dd HH:mm:ss') as timestamp) as event_time,
event_type,
product_id,
category_id,
category_code,
brand,
cast(price as float) as price,
cast(user_id as bigint) as user_id,
user_session
from
product_data_external_table;