Google bigquery 如何部分筛选子集字符串?
我正在尝试从字符串中筛选子字符串。我是这样做到的Google bigquery 如何部分筛选子集字符串?,google-bigquery,Google Bigquery,我正在尝试从字符串中筛选子字符串。我是这样做到的 WITH `project.dataset.table` AS ( SELECT 'anderstand' str UNION ALL SELECT 'anderstan' UNION ALL SELECT 'andersta' UNION ALL SELECT 'anderst' UNION ALL SELECT 'understand' str UNION ALL SELECT 'understan' UNION AL
WITH `project.dataset.table` AS (
SELECT 'anderstand' str UNION ALL
SELECT 'anderstan' UNION ALL
SELECT 'andersta' UNION ALL
SELECT 'anderst' UNION ALL
SELECT 'understand' str UNION ALL
SELECT 'understan' UNION ALL
SELECT 'understa' UNION ALL
SELECT 'underst' UNION ALL
SELECT 'unders' UNION ALL
SELECT 'under' UNION ALL
SELECT 'understand i' UNION ALL
SELECT 'understand it' UNION ALL
SELECT 'understand it y' UNION ALL
SELECT 'understand it ye' UNION ALL
SELECT 'understand it yes' UNION ALL
SELECT 'understand it yes it'
)
SELECT str FROM (
SELECT str,
STARTS_WITH(LAG(str) OVER(ORDER BY str DESC), str) flag
FROM `project.dataset.table`
)
WHERE NOT IFNULL(flag, FALSE)
只返回
Row str
1 understand it yes it
2 anderstand
预期结果是
Row str
1 understand it yes it
2 anderstand
3 understand it yes
4 understand
5 understand it
下面是BigQuery标准SQL
#standardSQL
SELECT str FROM (
SELECT str, STARTS_WITH(prev_str, str) AND
ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, r' ')) = ARRAY_LENGTH(REGEXP_EXTRACT_ALL(prev_str, r' ')) AS flag
FROM (
SELECT str, LAG(str) OVER(ORDER BY str DESC) AS prev_str
FROM `project.dataset.table`
)
)
WHERE NOT IFNULL(flag, FALSE)
如果要应用于您问题中的样本数据
WITH `project.dataset.table` AS (
SELECT 'anderstand' str UNION ALL
SELECT 'anderstan' UNION ALL
SELECT 'andersta' UNION ALL
SELECT 'anderst' UNION ALL
SELECT 'understand' str UNION ALL
SELECT 'understan' UNION ALL
SELECT 'understa' UNION ALL
SELECT 'underst' UNION ALL
SELECT 'unders' UNION ALL
SELECT 'under' UNION ALL
SELECT 'understand i' UNION ALL
SELECT 'understand it' UNION ALL
SELECT 'understand it y' UNION ALL
SELECT 'understand it ye' UNION ALL
SELECT 'understand it yes' UNION ALL
SELECT 'understand it yes it'
)
结果是
Row str
1 understand it yes it
2 understand it yes
3 understand it
4 understand
5 anderstand
这里我使用空格作为分隔符,但您可以通过在
REGEXP\u EXTRACT\u ALL(…,r')
中调整r''
来使用任何分隔符。例如,您可以使用r'\s'
将任何空格用作分隔符查询的唯一问题是,我无法将任何计数(*)放在它的任何位置,以确保这些单词是否重复。我可以用前面的问题来回答,我看不出你的问题有什么意义,所以我无法对此发表评论。这个问题已经被完全回答了,我想:o)仅仅是为了这个问题,另一个问题可能会很奇怪:)请不要制造一个新的。好的,我将创建另一个。由你们决定-若你们觉得它是对当前问题的合理的小扩展-更新它。如果没有-发布新的。当问题在给出正确答案后被扩展/更改时,通常情况下,您不会对此表示感谢