Vertica SQL函数,用于将字符串拆分为单独的列
SQL中是否有一种方法可以根据字符串中的分隔符将字符串拆分为n列。我知道SPLIT_PART函数有三个参数,字符串、分隔符和第n个分隔符。例如:Vertica SQL函数,用于将字符串拆分为单独的列,sql,split,vertica,Sql,Split,Vertica,SQL中是否有一种方法可以根据字符串中的分隔符将字符串拆分为n列。我知道SPLIT_PART函数有三个参数,字符串、分隔符和第n个分隔符。例如: select split_part('2016-01-01 00:11:00|Sprout|0', '|', 1), split_part('2016-01-01 00:11:00|Sprout|0', '|', 2), split_part('2016-01-01 00:11:00|Sprout|0', '|', 3); 有没有一种方法可以
select
split_part('2016-01-01 00:11:00|Sprout|0', '|', 1), split_part('2016-01-01 00:11:00|Sprout|0', '|', 2), split_part('2016-01-01 00:11:00|Sprout|0', '|', 3);
有没有一种方法可以在不使用第三个参数的情况下实现这一点,在第三个参数中,您只需提供字符串和分隔符,并且最终得到分隔符在字符串中显示的列数
一旦Vertica允许基于Python的UDF,我知道使用.split方法很容易解决这个问题,但是目前有解决方案吗?我知道这可能是一个长期的尝试,但我问的主要是出于好奇,因为使用split_part完全符合我的目的
这不可能是一个可接受的答案。如果您很高兴只获得字符串的第n个标记,请尝试:
SQL>SELECT
...> regexp_substr(
...> '2016-01-01 00:11:00|Sprout|0' -- source string
...> , '[|]?([^|]+)' -- pattern (an optional bar, followed by many non-bars, which we remember as the 1st group)
...> , 1 -- starting from begin of string: position 1
...> , 1 -- the N-th occurrence
...> , '' -- no regexp modifier
...> , 1 -- we want the only remembered group - the 1st
...> ) the_first
...>, regexp_substr(
...> '2016-01-01 00:11:00|Sprout|0' -- source string
...> , '[|]?([^|]+)' -- pattern (an optional bar, followed by many non-bars, which we remember as the 1st group)
...> , 1 -- starting from begin of string: position 1
...> , 2 -- the N-th occurrence
...> , '' -- no regexp modifier
...> , 1 -- we want the only remembered group - the 1st
...> ) the_second
...>, regexp_substr(
...> '2016-01-01 00:11:00|Sprout|0' -- source string
...> , '[|]?([^|]+)' -- pattern (an optional bar, followed by many non-bars, which we remember as the 1st group)
...> , 1 -- starting from begin of string: position 1
...> , 3 -- the N-th occurrence
...> , '' -- no regexp modifier
...> , 1 -- we want the only remembered group - the 1st
...> ) the_third
...>;
the_first |the_second |the_third
2016-01-01 00:11:00 |Sprout |0
但是,如果您希望旋转分隔字符串,使每个标记形成一个新行,则有两种可能:
SQL>-- manual, using regexp_substr ...
...>with
...>the_array as (
...> select 1 as idx
...>union all select 2
...>union all select 3
...>union all select 4
...>union all select 5
...>union all select 6
...>union all select 7
...>union all select 8
...>union all select 9
...>union all select 10 -- increase if you might get a bigger array than one of 10 elements
...>)
...> ,concepts as (
...>select '2016-01-01 00:11:00|Sprout|0' as concepts_list
...>)
...>select * from (
...> select
...> idx
...> ,trim(
...> regexp_substr(
...> concepts_list -- source string
...> ,'[|]?([^|]+)' -- pattern (an optional bar, followed by many non-bars, which we remember as the 1st group)
...> ,1 -- starting from begin of string: position 1
...> ,idx -- the idx-th occurrence
...> ,'' -- no regexp modifier
...> ,1 -- we want the only remembered group - the 1st
...> )
...> ) as concept
...> from concepts
...> cross join the_array
...>) foo
...>where concept <> ''
...>;
idx |concept
1|2016-01-01 00:11:00
3|0
2|Sprout
select succeeded; 3 rows fetched
SQL>-- using the strings_package on:
...>-- https://github.com/vertica/Vertica-Extension-Packages/blob/master/strings_package/src/StringTokenizerDelim.cpp
...>WITH csvtab(id,delimstring) AS (
...> SELECT 1,'2016-01-01 00:11:00|Sprout|0'
...>UNION ALL SELECT 2,'2016-01-02 00:11:00|Trout|1'
...>UNION ALL SELECT 3,'2016-01-03 00:11:00|Salmon|2'
...>UNION ALL SELECT 4,'2016-01-04 00:11:00|Bass|3'
...>)
...>SELECT id, words
...>FROM (
...> SELECT id, v_txtindex.StringTokenizerDelim(delimstring,'|') OVER (PARTITION by id) FROM csvtab
...>) a
...>ORDER BY 1;
id |words
1|2016-01-01 00:11:00
1|Sprout
1|0
2|2016-01-02 00:11:00
2|Trout
2|1
3|2016-01-03 00:11:00
3|Salmon
3|2
4|2016-01-04 00:11:00
4|Bass
4|3
select succeeded; 12 rows fetched
我更想知道的是,如果select语句中没有3项,是否可以得到单独的列。如果您熟悉Python,我希望使用string.split“|”的效果。如果这在SQL中不可能,则完全可以。您的第一个示例是我可能会使用vertica函数SPLIT_PARTstring、delimiter、occurrence执行的一个路由。