Vertica SQL函数，用于将字符串拆分为单独的列_Sql_Split_Vertica

Vertica SQL函数，用于将字符串拆分为单独的列

sql

Vertica SQL函数，用于将字符串拆分为单独的列,sql,split,vertica,Sql,Split,Vertica,SQL中是否有一种方法可以根据字符串中的分隔符将字符串拆分为n列。我知道SPLIT_PART函数有三个参数，字符串、分隔符和第n个分隔符。例如： select split_part('2016-01-01 00:11:00|Sprout|0', '|', 1), split_part('2016-01-01 00:11:00|Sprout|0', '|', 2), split_part('2016-01-01 00:11:00|Sprout|0', '|', 3); 有没有一种方法可以

SQL中是否有一种方法可以根据字符串中的分隔符将字符串拆分为n列。我知道SPLIT_PART函数有三个参数，字符串、分隔符和第n个分隔符。例如：

select 
  split_part('2016-01-01 00:11:00|Sprout|0', '|', 1),  split_part('2016-01-01 00:11:00|Sprout|0', '|', 2), split_part('2016-01-01 00:11:00|Sprout|0', '|', 3);

有没有一种方法可以在不使用第三个参数的情况下实现这一点，在第三个参数中，您只需提供字符串和分隔符，并且最终得到分隔符在字符串中显示的列数

一旦Vertica允许基于Python的UDF，我知道使用.split方法很容易解决这个问题，但是目前有解决方案吗？我知道这可能是一个长期的尝试，但我问的主要是出于好奇，因为使用split_part完全符合我的目的

这不可能是一个可接受的答案。如果您很高兴只获得字符串的第n个标记，请尝试：

    SQL>SELECT
    ...>  regexp_substr(
    ...>    '2016-01-01 00:11:00|Sprout|0' -- source string
    ...>  , '[|]?([^|]+)' -- pattern (an optional bar, followed by many non-bars, which we remember as the 1st group)
    ...>  , 1             -- starting from begin of string: position 1
    ...>  , 1             -- the N-th occurrence
    ...>  , ''            -- no regexp modifier
    ...>  , 1             -- we want the only remembered group - the 1st
    ...>  ) the_first
    ...>, regexp_substr(
    ...>    '2016-01-01 00:11:00|Sprout|0' -- source string
    ...>  , '[|]?([^|]+)' -- pattern (an optional bar, followed by many non-bars, which we remember as the 1st group)
    ...>  , 1             -- starting from begin of string: position 1
    ...>  , 2             -- the N-th occurrence
    ...>  , ''            -- no regexp modifier
    ...>  , 1             -- we want the only remembered group - the 1st
    ...>  ) the_second
    ...>, regexp_substr(
    ...>    '2016-01-01 00:11:00|Sprout|0' -- source string
    ...>  , '[|]?([^|]+)' -- pattern (an optional bar, followed by many non-bars, which we remember as the 1st group)
    ...>  , 1             -- starting from begin of string: position 1
    ...>  , 3             -- the N-th occurrence
    ...>  , ''            -- no regexp modifier
    ...>  , 1             -- we want the only remembered group - the 1st
    ...>  ) the_third
    ...>;
    the_first                   |the_second                  |the_third
    2016-01-01 00:11:00         |Sprout                      |0

但是，如果您希望旋转分隔字符串，使每个标记形成一个新行，则有两种可能：

    SQL>-- manual, using regexp_substr ...
    ...>with
    ...>the_array as (
    ...>          select  1 as idx
    ...>union all select  2
    ...>union all select  3
    ...>union all select  4
    ...>union all select  5
    ...>union all select  6
    ...>union all select  7
    ...>union all select  8
    ...>union all select  9
    ...>union all select 10 -- increase if you might get a bigger array than one of 10 elements
    ...>)
    ...> ,concepts as (
    ...>select '2016-01-01 00:11:00|Sprout|0' as concepts_list
    ...>)
    ...>select * from (
    ...>  select
    ...>   idx
    ...>  ,trim(
    ...>    regexp_substr(
    ...>     concepts_list -- source string
    ...>    ,'[|]?([^|]+)' -- pattern (an optional bar, followed by many non-bars, which we remember as the 1st group)
    ...>    ,1             -- starting from begin of string: position 1
    ...>    ,idx           -- the idx-th occurrence
    ...>    ,''            -- no regexp modifier
    ...>    ,1             -- we want the only remembered group - the 1st
    ...>    )
    ...>   ) as concept
    ...>  from concepts
    ...>  cross join the_array
    ...>) foo
    ...>where concept <> ''
    ...>;
    idx                 |concept
                       1|2016-01-01 00:11:00
                       3|0
                       2|Sprout
    select succeeded; 3 rows fetched
    SQL>-- using the strings_package on:
    ...>-- https://github.com/vertica/Vertica-Extension-Packages/blob/master/strings_package/src/StringTokenizerDelim.cpp
    ...>WITH csvtab(id,delimstring) AS (
    ...>          SELECT 1,'2016-01-01 00:11:00|Sprout|0'
    ...>UNION ALL SELECT 2,'2016-01-02 00:11:00|Trout|1'
    ...>UNION ALL SELECT 3,'2016-01-03 00:11:00|Salmon|2'
    ...>UNION ALL SELECT 4,'2016-01-04 00:11:00|Bass|3'
    ...>)
    ...>SELECT id, words
    ...>FROM (
    ...>  SELECT id, v_txtindex.StringTokenizerDelim(delimstring,'|') OVER (PARTITION by id) FROM csvtab
    ...>) a
    ...>ORDER BY 1;
    id                  |words
                       1|2016-01-01 00:11:00
                       1|Sprout
                       1|0
                       2|2016-01-02 00:11:00
                       2|Trout
                       2|1
                       3|2016-01-03 00:11:00
                       3|Salmon
                       3|2
                       4|2016-01-04 00:11:00
                       4|Bass
                       4|3
    select succeeded; 12 rows fetched

我更想知道的是，如果select语句中没有3项，是否可以得到单独的列。如果您熟悉Python，我希望使用string.split“|”的效果。如果这在SQL中不可能，则完全可以。您的第一个示例是我可能会使用vertica函数SPLIT_PARTstring、delimiter、occurrence执行的一个路由。