Google bigquery 如何部分筛选子集字符串?

Google bigquery 如何部分筛选子集字符串?,google-bigquery,Google Bigquery,我正在尝试从字符串中筛选子字符串。我是这样做到的 WITH `project.dataset.table` AS ( SELECT 'anderstand' str UNION ALL SELECT 'anderstan' UNION ALL SELECT 'andersta' UNION ALL SELECT 'anderst' UNION ALL SELECT 'understand' str UNION ALL SELECT 'understan' UNION AL

我正在尝试从字符串中筛选子字符串。我是这样做到的

WITH `project.dataset.table` AS (
  SELECT 'anderstand' str UNION ALL
  SELECT 'anderstan' UNION ALL
  SELECT 'andersta' UNION ALL
  SELECT 'anderst' UNION ALL
  SELECT 'understand' str UNION ALL
  SELECT 'understan' UNION ALL
  SELECT 'understa' UNION ALL
  SELECT 'underst' UNION ALL
  SELECT 'unders' UNION ALL
  SELECT 'under' UNION ALL
  SELECT 'understand i' UNION ALL
  SELECT 'understand it' UNION ALL
  SELECT 'understand it y' UNION ALL
  SELECT 'understand it ye' UNION ALL
  SELECT 'understand it yes' UNION ALL
  SELECT 'understand it yes it' 

)
SELECT str FROM (
  SELECT str, 
    STARTS_WITH(LAG(str) OVER(ORDER BY str DESC), str) flag 
  FROM `project.dataset.table`
)
WHERE NOT IFNULL(flag, FALSE)
只返回

Row str 
1   understand it yes it
2   anderstand
预期结果是

Row str 
1   understand it yes it
2   anderstand 
3   understand it yes
4   understand
5   understand it

下面是BigQuery标准SQL

#standardSQL
SELECT str FROM (
  SELECT str, STARTS_WITH(prev_str, str) AND  
    ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, r' ')) = ARRAY_LENGTH(REGEXP_EXTRACT_ALL(prev_str, r' ')) AS flag
  FROM (
    SELECT str, LAG(str) OVER(ORDER BY str DESC) AS prev_str
    FROM `project.dataset.table`
  )
)
WHERE NOT IFNULL(flag, FALSE) 
如果要应用于您问题中的样本数据

WITH `project.dataset.table` AS (
  SELECT 'anderstand' str UNION ALL
  SELECT 'anderstan' UNION ALL
  SELECT 'andersta' UNION ALL
  SELECT 'anderst' UNION ALL
  SELECT 'understand' str UNION ALL
  SELECT 'understan' UNION ALL
  SELECT 'understa' UNION ALL
  SELECT 'underst' UNION ALL
  SELECT 'unders' UNION ALL
  SELECT 'under' UNION ALL
  SELECT 'understand i' UNION ALL
  SELECT 'understand it' UNION ALL
  SELECT 'understand it y' UNION ALL
  SELECT 'understand it ye' UNION ALL
  SELECT 'understand it yes' UNION ALL
  SELECT 'understand it yes it' 
)
结果是

Row str  
1   understand it yes it     
2   understand it yes    
3   understand it    
4   understand   
5   anderstand     

这里我使用空格作为分隔符,但您可以通过在
REGEXP\u EXTRACT\u ALL(…,r')
中调整
r''
来使用任何分隔符。例如,您可以使用
r'\s'
将任何空格用作分隔符

查询的唯一问题是,我无法将任何计数(*)放在它的任何位置,以确保这些单词是否重复。我可以用前面的问题来回答,我看不出你的问题有什么意义,所以我无法对此发表评论。这个问题已经被完全回答了,我想:o)仅仅是为了这个问题,另一个问题可能会很奇怪:)请不要制造一个新的。好的,我将创建另一个。由你们决定-若你们觉得它是对当前问题的合理的小扩展-更新它。如果没有-发布新的。当问题在给出正确答案后被扩展/更改时,通常情况下,您不会对此表示感谢