统计列表列中出现的字符串-雪花/SQL

统计列表列中出现的字符串-雪花/SQL,sql,snowflake-cloud-data-platform,Sql,Snowflake Cloud Data Platform,我有一个表,其中包含一列字符串,如下所示: 例如: STRING User_ID [...] "[""null"",""personal"",""Other""]"

我有一个表,其中包含一列字符串,如下所示:

例如:

STRING                                                                 User_ID    [...]
"[""null"",""personal"",""Other""]"                                    2122213    .... 
"[""Other"",""to_dos_and_thing""]"                                     2132214    ....  
"[""getting_things_done"",""TO_dos_and_thing"",""Work!!!!!""]"         4342323    ....
问题:

我希望能够获得每个唯一字符串出现的次数计数(字符串在字符串列中用逗号分隔),但只知道如何执行以下操作:

SELECT u.STRING, count(u.USERID) as cnt
FROM table u
group by  u.STRING
order by cnt desc;
但是,上面的方法不起作用,因为它只计算使用特定字符串分组的用户ID的数量

使用上述示例的理想输出如下所示

期望输出:

STRING                     COUNT_Instances                                                             
"null"                     1223
"personal"                 543
"Other"                    324                  
"to_dos_and_thing"         221                                
"getting_things_done"      146
"Work!!!!!"                22 

根据您的描述,以下是我的示例表:

create table u (user_id number, string varchar);

insert into u values
(2122213, '"[""null"",""personal"",""Other""]"'),
(2132214, '"[""Other"",""to_dos_and_thing""]"'),
(2132215, '"[""getting_things_done"",""TO_dos_and_thing"",""Work!!!!!""]"' );
我使用SPLIT\u TO\u表将每个字符串拆分为一行,然后使用REGEXP\u SUBSTR清理数据。下面是查询和输出:

select REGEXP_SUBSTR( s.VALUE, '""(.*)""', 1, 1, 'i', 1 ) extracted, count(*) from u,
lateral SPLIT_TO_TABLE( string  , ',' ) s
GROUP BY extracted
order by count(*) DESC;


+---------------------+----------+
|      EXTRACTED      | COUNT(*) |
+---------------------+----------+
| Other               |        2 |
| null                |        1 |
| personal            |        1 |
| to_dos_and_thing    |        1 |
| getting_things_done |        1 |
| TO_dos_and_thing    |        1 |
| Work!!!!!           |        1 |
+---------------------+----------+
拆分到表
REGEXP_SUBSTR

根据您的描述,以下是我的示例表:

create table u (user_id number, string varchar);

insert into u values
(2122213, '"[""null"",""personal"",""Other""]"'),
(2132214, '"[""Other"",""to_dos_and_thing""]"'),
(2132215, '"[""getting_things_done"",""TO_dos_and_thing"",""Work!!!!!""]"' );
我使用SPLIT\u TO\u表将每个字符串拆分为一行,然后使用REGEXP\u SUBSTR清理数据。下面是查询和输出:

select REGEXP_SUBSTR( s.VALUE, '""(.*)""', 1, 1, 'i', 1 ) extracted, count(*) from u,
lateral SPLIT_TO_TABLE( string  , ',' ) s
GROUP BY extracted
order by count(*) DESC;


+---------------------+----------+
|      EXTRACTED      | COUNT(*) |
+---------------------+----------+
| Other               |        2 |
| null                |        1 |
| personal            |        1 |
| to_dos_and_thing    |        1 |
| getting_things_done |        1 |
| TO_dos_and_thing    |        1 |
| Work!!!!!           |        1 |
+---------------------+----------+
拆分到表 REGEXP\u SUBSTR