Hive 按分隔符将列拆分为配置单元中唯一的行
我有一个数据集。请参见下面的示例行: 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460695483:440507;1460777656:440515;1460778054:440488;1460778157:440481,440600; 每列由一个空格分隔(共3列)。列名称为id(int)、unid(string)、time_stamp(string) 我希望拆分数据集,以便将每个唯一元素(如)拆分为以下行:-Hive 按分隔符将列拆分为配置单元中唯一的行,hive,Hive,我有一个数据集。请参见下面的示例行: 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460695483:440507;1460777656:440515;1460778054:440488;1460778157:440481,440600; 每列由一个空格分隔(共3列)。列名称为id(int)、unid(string)、time_stamp(string) 我希望拆分数据集,以便将每个唯一元素(如)拆分为以下行:- 94654 6802D326-9F9
- 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460695483:440507
- 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 146077656:440515
- 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778054:440488
- 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778157:440481
- 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778157:440600
谢谢你的帮助!提前感谢:)首先,我不得不用一根管子替换分流器。因此:
CREATE temporary TABLE tbl
(id int,
unid string,
time_stamp string);
INSERT INTO tbl
VALUES (
94654, '6802D326-9F9B-4FC8-B2DD-F878EADE31F2' , '1460695483:440507|1460777656:440515|1460778054:440488|1460778157:440481,440600');
SELECT
id,
unid,
time_stamp
FROM
(
SELECT
id,
unid,
split(time_stamp,'\\|') ts
FROM
tbl
) t
lateral VIEW explode(t.ts) bar AS time_stamp;
这给了我们:
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460695483:440507
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460777656:440515
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778054:440488
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778157:440481,440600
您必须在单独的步骤中进行拆分和分解。因此,我们在派生表中进行拆分,在外部查询中进行分解/横向视图。非常感谢Andrew!:)