Google bigquery 如何使用count部分筛选子集字符串?

Google bigquery 如何使用count部分筛选子集字符串?,google-bigquery,Google Bigquery,我正在尝试从字符串中筛选子字符串。我是这样做到的 WITH `project.dataset.table` AS ( SELECT 'anderstand' str UNION ALL SELECT 'anderstan' UNION ALL SELECT 'andersta' UNION ALL SELECT 'anderst' UNION ALL SELECT 'understand' str UNION ALL SELECT 'understan' UNION AL

我正在尝试从字符串中筛选子字符串。我是这样做到的

WITH `project.dataset.table` AS (
  SELECT 'anderstand' str UNION ALL
  SELECT 'anderstan' UNION ALL
  SELECT 'andersta' UNION ALL
  SELECT 'anderst' UNION ALL
  SELECT 'understand' str UNION ALL
  SELECT 'understan' UNION ALL
  SELECT 'understa' UNION ALL
  SELECT 'underst' UNION ALL
  SELECT 'unders' UNION ALL
  SELECT 'under' UNION ALL
  SELECT 'understand i' UNION ALL
  SELECT 'understand i' UNION ALL
  SELECT 'understand it' UNION ALL
  SELECT 'understand it' UNION ALL
  SELECT 'understand it y' UNION ALL
  SELECT 'understand it ye' UNION ALL
  SELECT 'understand it yes' UNION ALL
  SELECT 'understand it yes it' UNION ALL
  SELECT 'understand it yes it'
)

只返回

Row str  
1   understand it yes it     
2   understand it yes    
3   understand it    
4   understand   
5   anderstand  
预期结果是

Row str                   count
1   understand it yes it   2
2   anderstand             1
3   understand it yes      1
4   understand             1
5   understand it          2

下面是BigQuery标准SQL

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'anderstand' str UNION ALL
  SELECT 'anderstan' UNION ALL
  SELECT 'andersta' UNION ALL
  SELECT 'anderst' UNION ALL
  SELECT 'understand' UNION ALL
  SELECT 'understan' UNION ALL
  SELECT 'understa' UNION ALL
  SELECT 'underst' UNION ALL
  SELECT 'unders' UNION ALL
  SELECT 'under' UNION ALL
  SELECT 'understand i' UNION ALL
  SELECT 'understand i' UNION ALL
  SELECT 'understand it' UNION ALL
  SELECT 'understand it' UNION ALL
  SELECT 'understand it y' UNION ALL
  SELECT 'understand it ye' UNION ALL
  SELECT 'understand it yes' UNION ALL
  SELECT 'understand it yes it' UNION ALL
  SELECT 'understand it yes it'
), temp AS (
  SELECT str, COUNT(1) `count`
  FROM `project.dataset.table`
  GROUP BY str
)
SELECT str , `count` FROM (
  SELECT str, `count`, STARTS_WITH(prev_str, str) AND  
    ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, r' ')) = ARRAY_LENGTH(REGEXP_EXTRACT_ALL(prev_str, r' ')) AS flag
  FROM (
    SELECT str, `count`, LAG(str) OVER(ORDER BY str DESC) AS prev_str
    FROM temp
  )
)
WHERE NOT IFNULL(flag, FALSE) 
有输出

Row str                     count    
1   understand it yes it    2    
2   understand it yes       1    
3   understand it           2    
4   understand              1    
5   anderstand              1    
要使用上述方法,只需在下面运行查询,将project.dataset.table替换为对表的引用,就像yourproject.yourdataset.yourtable一样


我认为您没有描述计数的逻辑-我不认为这在这里很明显-至少我无法从project.dataset.table中显示的示例中捕捉到它,您将看到重复的单词。所以我想知道“str”在过滤器sql standardSQL中重复了多少次?这样我就可以知道有多少次“理解”或“理解”被重涂。我不是在寻找“理解它”,Mikhailberlyanty你的查询是有效的,但当我实现我的查询时,它只会给出计数。我可以在哪里上传我的查询供您查看?你以前的答案很有用。我也尝试过在OP中使用sql的子查询。Returms 1奇怪。我刚刚意识到你有两个选择查询的str在里面。也许这就是问题所在?因为当我删除它们时,它不起作用。很抱歉,它已经超出了我的知识范围。这是我的sql代码。如果你能编辑它就好了。我还在努力。我回答了你的问题,它产生了你期望的结果。我看不出我还能为你们做些什么:看起来你们总是在最初的问题上添加其他东西。所以是一个问答网站-你问问题-我们回答它。你有一个新问题——你发布它,我们回答它,等等。我建议你花点时间,找出真正的问题,而不是片面的问题,然后提问/发帖。同时考虑投票并接受答案。
Row str                     count    
1   understand it yes it    2    
2   understand it yes       1    
3   understand it           2    
4   understand              1    
5   anderstand              1    
#standardSQL
WITH temp AS (
  SELECT str, COUNT(1) `count`
  FROM `project.dataset.table`
  GROUP BY str
)
SELECT str , `count` FROM (
  SELECT str, `count`, STARTS_WITH(prev_str, str) AND  
    ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, r' ')) = ARRAY_LENGTH(REGEXP_EXTRACT_ALL(prev_str, r' ')) AS flag
  FROM (
    SELECT str, `count`, LAG(str) OVER(ORDER BY str DESC) AS prev_str
    FROM temp
  )
)
WHERE NOT IFNULL(flag, FALSE)