根据数据点sql的重要性筛选BigQuery中的数据
我正在BigQuery中合并两个表,并在几个条件下过滤它们。代码如下所示:根据数据点sql的重要性筛选BigQuery中的数据,sql,string,google-bigquery,sql-order-by,greatest-n-per-group,Sql,String,Google Bigquery,Sql Order By,Greatest N Per Group,我正在BigQuery中合并两个表,并在几个条件下过滤它们。代码如下所示: SELECT, d.id, d.duration, c.action, c.url FROM ( `table_action_url` c INNER JOIN `table_duration` d ON (d.id = c.id) ) WHERE c.url LIKE "https://www.mywebpage%"
SELECT,
d.id,
d.duration,
c.action,
c.url
FROM
(
`table_action_url` c
INNER JOIN `table_duration` d ON (d.id = c.id)
)
WHERE c.url LIKE "https://www.mywebpage%"
AND d.duration = '15000'
AND c.action in ('First quartile', 'Midpoint', 'Third quartile', 'Complete')
输出为:
id duration action url
1 15000 Midpoint https://www.mywebpage_fashion
1 15000 Complete https://www.mywebpage_fashion
2 15000 First quartile https://www.mywebpage_home
2 15000 Midpoint https://www.mywebpage_home
id duration action url
1 15000 Complete https://www.mywebpage_fashion
2 15000 Midpoint https://www.mywebpage_home
我需要添加一个逻辑,从操作中只获取一个值。优先级是Complete
,Third quartile
等。因此,代码需要比较ids
和url
,如果最大值是Complete
(对于相同的id和url),则获取该值。
期望输出为:
id duration action url
1 15000 Midpoint https://www.mywebpage_fashion
1 15000 Complete https://www.mywebpage_fashion
2 15000 First quartile https://www.mywebpage_home
2 15000 Midpoint https://www.mywebpage_home
id duration action url
1 15000 Complete https://www.mywebpage_fashion
2 15000 Midpoint https://www.mywebpage_home
您可以使用窗口函数和
大小写表达式:
SELECT * EXCEPT(rn)
FROM (
SELECT,
d.id,
d.duration,
c.action,
c.url,
ROW_NUMBER() OVER(PARTITION BY d.id ORDER BY CASE c.action
WHEN 'Complete' THEN 1
WHEN 'Third quartile' THEN 2
WHEN 'Midpoint' THEN 3
WHEN 'First quartile' THEN 4
END) rn
FROM `table_action_url` c
INNER JOIN `table_duration` d ON d.id = c.id
WHERE
c.url LIKE "https://www.mywebpage%"
AND d.duration = '15000'
AND c.action in ('First quartile', 'Midpoint', 'Third quartile', 'Complete')
) t
WHERE rn = 1
您可以使用窗口函数和大小写表达式:
SELECT * EXCEPT(rn)
FROM (
SELECT,
d.id,
d.duration,
c.action,
c.url,
ROW_NUMBER() OVER(PARTITION BY d.id ORDER BY CASE c.action
WHEN 'Complete' THEN 1
WHEN 'Third quartile' THEN 2
WHEN 'Midpoint' THEN 3
WHEN 'First quartile' THEN 4
END) rn
FROM `table_action_url` c
INNER JOIN `table_duration` d ON d.id = c.id
WHERE
c.url LIKE "https://www.mywebpage%"
AND d.duration = '15000'
AND c.action in ('First quartile', 'Midpoint', 'Third quartile', 'Complete')
) t
WHERE rn = 1
在BigQuery中,可以使用聚合执行此操作:
SELECT d.id, d.duration,
( ARRAY_AGG(c.action ORDER BY ao.ord DESC LIMIT 1) )[ORDINAL(1)] as action,
( ARRAY_AGG(c.url ORDER BY ao.ord DESC LIMIT 1) )[ORDINAL(1)] as url
FROM `table_action_url` c JOIN
`table_duration` d
ON d.id = c.id JOIN
(SELECT 'Complete' as action, 1 as ord UNION ALL
SELECT 'Third quartile' as action, 2 as ord UNION ALL
SELECT 'Midpoint' as action, 3 as ord UNION ALL
SELECT 'First quartile' as action, 4 as ord
) ao
ON c.action = ao.action
WHERE c.url LIKE 'https://www.mywebpage%' AND
d.duration = '15000'
GROUP BY d.id, d.duration;
在BigQuery中,可以使用聚合执行此操作:
SELECT d.id, d.duration,
( ARRAY_AGG(c.action ORDER BY ao.ord DESC LIMIT 1) )[ORDINAL(1)] as action,
( ARRAY_AGG(c.url ORDER BY ao.ord DESC LIMIT 1) )[ORDINAL(1)] as url
FROM `table_action_url` c JOIN
`table_duration` d
ON d.id = c.id JOIN
(SELECT 'Complete' as action, 1 as ord UNION ALL
SELECT 'Third quartile' as action, 2 as ord UNION ALL
SELECT 'Midpoint' as action, 3 as ord UNION ALL
SELECT 'First quartile' as action, 4 as ord
) ao
ON c.action = ao.action
WHERE c.url LIKE 'https://www.mywebpage%' AND
d.duration = '15000'
GROUP BY d.id, d.duration;
我在这里看到的最简单和通用的方法就是用下面的代码结束现有的查询
#standardSQL
SELECT AS VALUE
ARRAY_AGG(current_query_result
ORDER BY CASE action
WHEN 'Complete' THEN 1
WHEN 'Third quartile' THEN 2
WHEN 'Midpoint' THEN 3
WHEN 'First quartile' THEN 4
END
LIMIT 1
)[OFFSET(0)]
FROM (
SELECT,
d.id,
d.duration,
c.action,
c.url
FROM `table_action_url` c
INNER JOIN `table_duration` d USING(id)
WHERE c.url LIKE "https://www.mywebpage%"
AND d.duration = '15000'
AND c.action in ('First quartile', 'Midpoint', 'Third quartile', 'Complete')
) current_query_result
GROUP BY id, url
有输出
Row id duration action url
1 1 15000 Complete https://www.mywebpage_fashion
2 2 15000 Midpoint https://www.mywebpage_home
正如您所看到的,通过下面的片段实现了对候选对象进行排序和选择的方法
ORDER BY CASE action
WHEN 'Complete' THEN 1
WHEN 'Third quartile' THEN 2
WHEN 'Midpoint' THEN 3
WHEN 'First quartile' THEN 4
END
LIMIT 1
还有另一种方法可以实现同样的效果,即使用更少的冗长、更易于管理和可能更高效的代码(这不是经过验证的,只是我的感觉)
我在这里看到的最简单和通用的方法就是用下面的代码结束现有的查询
#standardSQL
SELECT AS VALUE
ARRAY_AGG(current_query_result
ORDER BY CASE action
WHEN 'Complete' THEN 1
WHEN 'Third quartile' THEN 2
WHEN 'Midpoint' THEN 3
WHEN 'First quartile' THEN 4
END
LIMIT 1
)[OFFSET(0)]
FROM (
SELECT,
d.id,
d.duration,
c.action,
c.url
FROM `table_action_url` c
INNER JOIN `table_duration` d USING(id)
WHERE c.url LIKE "https://www.mywebpage%"
AND d.duration = '15000'
AND c.action in ('First quartile', 'Midpoint', 'Third quartile', 'Complete')
) current_query_result
GROUP BY id, url
有输出
Row id duration action url
1 1 15000 Complete https://www.mywebpage_fashion
2 2 15000 Midpoint https://www.mywebpage_home
正如您所看到的,通过下面的片段实现了对候选对象进行排序和选择的方法
ORDER BY CASE action
WHEN 'Complete' THEN 1
WHEN 'Third quartile' THEN 2
WHEN 'Midpoint' THEN 3
WHEN 'First quartile' THEN 4
END
LIMIT 1
还有另一种方法可以实现同样的效果,即使用更少的冗长、更易于管理和可能更高效的代码(这不是经过验证的,只是我的感觉)
感谢您的回复,但此查询只返回“完整”结果…感谢您的回复,但此查询只返回“完整”结果…谢谢您,Mikhail,这非常有效。我已经接受了答案。谢谢你,米哈伊尔,这非常有效。我已经接受了答案。