根据数据点sql的重要性筛选BigQuery中的数据

根据数据点sql的重要性筛选BigQuery中的数据,sql,string,google-bigquery,sql-order-by,greatest-n-per-group,Sql,String,Google Bigquery,Sql Order By,Greatest N Per Group,我正在BigQuery中合并两个表,并在几个条件下过滤它们。代码如下所示: SELECT, d.id, d.duration, c.action, c.url FROM ( `table_action_url` c INNER JOIN `table_duration` d ON (d.id = c.id) ) WHERE c.url LIKE "https://www.mywebpage%"

我正在BigQuery中合并两个表,并在几个条件下过滤它们。代码如下所示:

SELECT,
    d.id,
    d.duration,
    c.action,
    c.url
FROM
    (
        `table_action_url` c
        INNER JOIN `table_duration` d ON (d.id = c.id)
    )
WHERE c.url LIKE "https://www.mywebpage%" 
AND d.duration = '15000' 
AND c.action in ('First quartile', 'Midpoint', 'Third quartile', 'Complete')
输出为:

id      duration      action                    url 
1         15000        Midpoint           https://www.mywebpage_fashion
1         15000        Complete           https://www.mywebpage_fashion
2         15000        First quartile     https://www.mywebpage_home
2         15000        Midpoint           https://www.mywebpage_home
id      duration      action                    url 
1         15000        Complete           https://www.mywebpage_fashion
2         15000        Midpoint           https://www.mywebpage_home
我需要添加一个逻辑,从操作中只获取一个值。优先级是
Complete
Third quartile
等。因此,代码需要比较
ids
url
,如果最大值是
Complete
(对于相同的id和url),则获取该值。 期望输出为:

id      duration      action                    url 
1         15000        Midpoint           https://www.mywebpage_fashion
1         15000        Complete           https://www.mywebpage_fashion
2         15000        First quartile     https://www.mywebpage_home
2         15000        Midpoint           https://www.mywebpage_home
id      duration      action                    url 
1         15000        Complete           https://www.mywebpage_fashion
2         15000        Midpoint           https://www.mywebpage_home

您可以使用窗口函数和
大小写
表达式:

SELECT * EXCEPT(rn)
FROM (
    SELECT,
        d.id,
        d.duration,
        c.action,
        c.url,
        ROW_NUMBER() OVER(PARTITION BY d.id ORDER BY CASE c.action
            WHEN 'Complete' THEN 1
            WHEN 'Third quartile' THEN 2
            WHEN 'Midpoint' THEN 3
            WHEN 'First quartile' THEN 4
        END) rn
    FROM `table_action_url` c
    INNER JOIN `table_duration` d ON d.id = c.id
    WHERE 
        c.url LIKE "https://www.mywebpage%" 
        AND d.duration = '15000' 
        AND c.action in ('First quartile', 'Midpoint', 'Third quartile', 'Complete')
) t
WHERE rn = 1

您可以使用窗口函数和
大小写
表达式:

SELECT * EXCEPT(rn)
FROM (
    SELECT,
        d.id,
        d.duration,
        c.action,
        c.url,
        ROW_NUMBER() OVER(PARTITION BY d.id ORDER BY CASE c.action
            WHEN 'Complete' THEN 1
            WHEN 'Third quartile' THEN 2
            WHEN 'Midpoint' THEN 3
            WHEN 'First quartile' THEN 4
        END) rn
    FROM `table_action_url` c
    INNER JOIN `table_duration` d ON d.id = c.id
    WHERE 
        c.url LIKE "https://www.mywebpage%" 
        AND d.duration = '15000' 
        AND c.action in ('First quartile', 'Midpoint', 'Third quartile', 'Complete')
) t
WHERE rn = 1

在BigQuery中,可以使用聚合执行此操作:

SELECT d.id, d.duration,
       ( ARRAY_AGG(c.action ORDER BY ao.ord DESC LIMIT 1) )[ORDINAL(1)] as action,
       ( ARRAY_AGG(c.url ORDER BY ao.ord DESC LIMIT 1) )[ORDINAL(1)] as url
FROM `table_action_url` c JOIN
     `table_duration` d
     ON d.id = c.id JOIN
     (SELECT 'Complete' as action, 1 as ord UNION ALL
      SELECT 'Third quartile' as action, 2 as ord UNION ALL
      SELECT 'Midpoint' as action, 3 as ord UNION ALL
      SELECT 'First quartile' as action, 4 as ord
     ) ao
     ON c.action = ao.action      
WHERE c.url LIKE 'https://www.mywebpage%' AND
      d.duration = '15000' 
GROUP BY d.id, d.duration;

在BigQuery中,可以使用聚合执行此操作:

SELECT d.id, d.duration,
       ( ARRAY_AGG(c.action ORDER BY ao.ord DESC LIMIT 1) )[ORDINAL(1)] as action,
       ( ARRAY_AGG(c.url ORDER BY ao.ord DESC LIMIT 1) )[ORDINAL(1)] as url
FROM `table_action_url` c JOIN
     `table_duration` d
     ON d.id = c.id JOIN
     (SELECT 'Complete' as action, 1 as ord UNION ALL
      SELECT 'Third quartile' as action, 2 as ord UNION ALL
      SELECT 'Midpoint' as action, 3 as ord UNION ALL
      SELECT 'First quartile' as action, 4 as ord
     ) ao
     ON c.action = ao.action      
WHERE c.url LIKE 'https://www.mywebpage%' AND
      d.duration = '15000' 
GROUP BY d.id, d.duration;

我在这里看到的最简单和通用的方法就是用下面的代码结束现有的查询

#standardSQL
SELECT AS VALUE 
  ARRAY_AGG(current_query_result 
    ORDER BY CASE action
      WHEN 'Complete' THEN 1
      WHEN 'Third quartile' THEN 2
      WHEN 'Midpoint' THEN 3
      WHEN 'First quartile' THEN 4
    END
    LIMIT 1
  )[OFFSET(0)] 
FROM (
  SELECT,
    d.id,
    d.duration,
    c.action,
    c.url
  FROM `table_action_url` c
  INNER JOIN `table_duration` d USING(id)
  WHERE c.url LIKE "https://www.mywebpage%" 
  AND d.duration = '15000' 
  AND c.action in ('First quartile', 'Midpoint', 'Third quartile', 'Complete')
) current_query_result
GROUP BY id, url   
有输出

Row id  duration    action      url  
1   1   15000       Complete    https://www.mywebpage_fashion    
2   2   15000       Midpoint    https://www.mywebpage_home     
正如您所看到的,通过下面的片段实现了对候选对象进行排序和选择的方法

ORDER BY CASE action
  WHEN 'Complete' THEN 1
  WHEN 'Third quartile' THEN 2
  WHEN 'Midpoint' THEN 3
  WHEN 'First quartile' THEN 4
END
LIMIT 1    
还有另一种方法可以实现同样的效果,即使用更少的冗长、更易于管理和可能更高效的代码(这不是经过验证的,只是我的感觉)


我在这里看到的最简单和通用的方法就是用下面的代码结束现有的查询

#standardSQL
SELECT AS VALUE 
  ARRAY_AGG(current_query_result 
    ORDER BY CASE action
      WHEN 'Complete' THEN 1
      WHEN 'Third quartile' THEN 2
      WHEN 'Midpoint' THEN 3
      WHEN 'First quartile' THEN 4
    END
    LIMIT 1
  )[OFFSET(0)] 
FROM (
  SELECT,
    d.id,
    d.duration,
    c.action,
    c.url
  FROM `table_action_url` c
  INNER JOIN `table_duration` d USING(id)
  WHERE c.url LIKE "https://www.mywebpage%" 
  AND d.duration = '15000' 
  AND c.action in ('First quartile', 'Midpoint', 'Third quartile', 'Complete')
) current_query_result
GROUP BY id, url   
有输出

Row id  duration    action      url  
1   1   15000       Complete    https://www.mywebpage_fashion    
2   2   15000       Midpoint    https://www.mywebpage_home     
正如您所看到的,通过下面的片段实现了对候选对象进行排序和选择的方法

ORDER BY CASE action
  WHEN 'Complete' THEN 1
  WHEN 'Third quartile' THEN 2
  WHEN 'Midpoint' THEN 3
  WHEN 'First quartile' THEN 4
END
LIMIT 1    
还有另一种方法可以实现同样的效果,即使用更少的冗长、更易于管理和可能更高效的代码(这不是经过验证的,只是我的感觉)


感谢您的回复,但此查询只返回“完整”结果…感谢您的回复,但此查询只返回“完整”结果…谢谢您,Mikhail,这非常有效。我已经接受了答案。谢谢你,米哈伊尔,这非常有效。我已经接受了答案。