蜂巢-在一列中计算不同的CSV
其中一个蜂巢表如下所示:蜂巢-在一列中计算不同的CSV,csv,hive,Csv,Hive,其中一个蜂巢表如下所示: ID listOfcategories 1 ["a","b","b","a","c","d","d"] 2 ["a","a","a","c","c","c","c","e","e","e"] 3 ["a","b","c"] 逗号分隔值的数量是一个变量。我想查询每个行/ID中不同类别的数量。 因此,我的输出应该如下所示: ID numDistCategories 1 4 2 3 3
ID listOfcategories
1 ["a","b","b","a","c","d","d"]
2 ["a","a","a","c","c","c","c","e","e","e"]
3 ["a","b","c"]
逗号分隔值的数量是一个变量。我想查询每个行/ID中不同类别的数量。
因此,我的输出应该如下所示:
ID numDistCategories
1 4
2 3
3 3
您可以使用
explode
to,然后使用count distinct
获得您要查找的结果
像这样的
SELECT
id,
COUNT(DISTINCT(cat)) as numDistCategories
FROM (
SELECT
id,
EXPLODE(listOfcategories) AS cat
FROM myTable) a
GROUP BY id;
希望有帮助