蜂巢-在一列中计算不同的CSV

蜂巢-在一列中计算不同的CSV,csv,hive,Csv,Hive,其中一个蜂巢表如下所示: ID listOfcategories 1 ["a","b","b","a","c","d","d"] 2 ["a","a","a","c","c","c","c","e","e","e"] 3 ["a","b","c"] 逗号分隔值的数量是一个变量。我想查询每个行/ID中不同类别的数量。 因此,我的输出应该如下所示: ID numDistCategories 1 4 2 3 3

其中一个蜂巢表如下所示:

 ID    listOfcategories
    1     ["a","b","b","a","c","d","d"]
    2     ["a","a","a","c","c","c","c","e","e","e"]
    3     ["a","b","c"]
逗号分隔值的数量是一个变量。我想查询每个行/ID中不同类别的数量。 因此,我的输出应该如下所示:

ID     numDistCategories
1      4
2      3
3      3

您可以使用
explode
to,然后使用
count distinct
获得您要查找的结果

像这样的

SELECT 
    id, 
    COUNT(DISTINCT(cat)) as numDistCategories
FROM (
    SELECT 
        id, 
        EXPLODE(listOfcategories) AS cat 
    FROM myTable) a
GROUP BY id;
希望有帮助