统计列表列中出现的字符串-雪花/SQL
我有一个表,其中包含一列字符串,如下所示: 例如:统计列表列中出现的字符串-雪花/SQL,sql,snowflake-cloud-data-platform,Sql,Snowflake Cloud Data Platform,我有一个表,其中包含一列字符串,如下所示: 例如: STRING User_ID [...] "[""null"",""personal"",""Other""]"
STRING User_ID [...]
"[""null"",""personal"",""Other""]" 2122213 ....
"[""Other"",""to_dos_and_thing""]" 2132214 ....
"[""getting_things_done"",""TO_dos_and_thing"",""Work!!!!!""]" 4342323 ....
问题:
我希望能够获得每个唯一字符串出现的次数计数(字符串在字符串列中用逗号分隔),但只知道如何执行以下操作:
SELECT u.STRING, count(u.USERID) as cnt
FROM table u
group by u.STRING
order by cnt desc;
但是,上面的方法不起作用,因为它只计算使用特定字符串分组的用户ID的数量
使用上述示例的理想输出如下所示
期望输出:
STRING COUNT_Instances
"null" 1223
"personal" 543
"Other" 324
"to_dos_and_thing" 221
"getting_things_done" 146
"Work!!!!!" 22
根据您的描述,以下是我的示例表:
create table u (user_id number, string varchar);
insert into u values
(2122213, '"[""null"",""personal"",""Other""]"'),
(2132214, '"[""Other"",""to_dos_and_thing""]"'),
(2132215, '"[""getting_things_done"",""TO_dos_and_thing"",""Work!!!!!""]"' );
我使用SPLIT\u TO\u表将每个字符串拆分为一行,然后使用REGEXP\u SUBSTR清理数据。下面是查询和输出:
select REGEXP_SUBSTR( s.VALUE, '""(.*)""', 1, 1, 'i', 1 ) extracted, count(*) from u,
lateral SPLIT_TO_TABLE( string , ',' ) s
GROUP BY extracted
order by count(*) DESC;
+---------------------+----------+
| EXTRACTED | COUNT(*) |
+---------------------+----------+
| Other | 2 |
| null | 1 |
| personal | 1 |
| to_dos_and_thing | 1 |
| getting_things_done | 1 |
| TO_dos_and_thing | 1 |
| Work!!!!! | 1 |
+---------------------+----------+
拆分到表
REGEXP_SUBSTR根据您的描述,以下是我的示例表:
create table u (user_id number, string varchar);
insert into u values
(2122213, '"[""null"",""personal"",""Other""]"'),
(2132214, '"[""Other"",""to_dos_and_thing""]"'),
(2132215, '"[""getting_things_done"",""TO_dos_and_thing"",""Work!!!!!""]"' );
我使用SPLIT\u TO\u表将每个字符串拆分为一行,然后使用REGEXP\u SUBSTR清理数据。下面是查询和输出:
select REGEXP_SUBSTR( s.VALUE, '""(.*)""', 1, 1, 'i', 1 ) extracted, count(*) from u,
lateral SPLIT_TO_TABLE( string , ',' ) s
GROUP BY extracted
order by count(*) DESC;
+---------------------+----------+
| EXTRACTED | COUNT(*) |
+---------------------+----------+
| Other | 2 |
| null | 1 |
| personal | 1 |
| to_dos_and_thing | 1 |
| getting_things_done | 1 |
| TO_dos_and_thing | 1 |
| Work!!!!! | 1 |
+---------------------+----------+
拆分到表
REGEXP\u SUBSTR