SQL嵌套数据运行总计,直到达到值为止
我试图使用以下教程在Google BigQuery上使用复杂的嵌套数据集: 导出为json的数据如下所示: 我正在使用的有效示例是: 但是,我希望在SUMbin.density达到0.7时输出bin.end值time。我的预期输出如下所示:SQL嵌套数据运行总计,直到达到值为止,sql,google-bigquery,chrome-ux-report,Sql,Google Bigquery,Chrome Ux Report,我试图使用以下教程在Google BigQuery上使用复杂的嵌套数据集: 导出为json的数据如下所示: 我正在使用的有效示例是: 但是,我希望在SUMbin.density达到0.7时输出bin.end值time。我的预期输出如下所示: 1 4G 1000 2 3G 50000 3 slow-2G null 4 2G null 5 offline null 这意味着在4G连接上,70%的0.7页加载时间少于1.5秒。我已
1 4G 1000
2 3G 50000
3 slow-2G null
4 2G null
5 offline null
这意味着在4G连接上,70%的0.7页加载时间少于1.5秒。我已尝试将脚本修改为:
SELECT
SUM(bin.density)
WHERE
SUM(bin.density) <= 0.7
但这是不允许的,所以尝试:
SELECT
SUM(bin.density) AS RunningTotal
WHERE
RunningTotal <= 0.7
也试过了
SELECT
SUM(bin.density) OVER() AS RunningTotal
WHERE
RunningTotal <= 0.7
但这也不行!如何使用嵌套数据集实现运行总计?然后让它输出bin.end时间
如果我不能让嵌套的数据集与SQL运行总计一起工作,那么我唯一的另一个选择就是展平数据集,并使用Python循环遍历每一行以计算结果。这是远远不够的性能
更新:基于Felipe Hoffa回答的解决方案
我想您需要一个子查询和一个累计总和:
with cte as (
<your query here>
)
select x.*
from (select cte.*, sum(density) over (order by density desc) as runningtotal
from cte
) x
where runningtotal - density < 0.7 and runningtotal >= 0.7;
使用累积和并按连接对结果进行排序:
#standardSQL
SELECT ect, `end`, density
FROM (
SELECT ect, `end`, density, ROW_NUMBER() OVER(PARTITION BY ect ORDER BY `end` DESC) rn
FROM (
SELECT ect, bin.end, SUM(bin.density) OVER(PARTITION BY ect ORDER BY `end`) AS density
FROM (
SELECT effective_connection_type.name ect, first_contentful_paint.histogram
FROM `chrome-ux-report.chrome_ux_report.201710`
WHERE origin = 'http://example.com'
) , UNNEST(histogram.bin) AS bin
)
WHERE density < 0.7
)
WHERE rn=1
输出bin.end值意味着什么?对于每个etc,如果组的密度至少为0.7,是否要返回bin.end某个子集的数组?如果您在问题中给出所需输出的示例将有所帮助。抱歉,请更新问题,并说明预期结果和bin.end time是什么。数据集有点混乱,这是谷歌的不是我的:你的问题不应该包含解决方案。如果您想分享您的解决方案,请删除该解决方案并发布答案。不过,您如何看待此输出bin.end?假设CTE先不检测是否满足绘制直方图bin?@ElliottBrossard。假设一切都会在CTE中完成。哇,这正是我所需要的!虽然当我手动计算时,它们不会相加:将4G的所有值相加到1000,得到的结果是:0.1504+0.1415+0.1262+0.0697+0.0473Ah,我相信这是将桌面和移动设备加在一起。这很容易解决。谢谢你的帮助!更新了我的最终解决方案。请点击此处链接:
#standardSQL
SELECT origin, form, ect, `end`, density
FROM (
SELECT origin, form, ect, `end`, density, ROW_NUMBER() OVER(PARTITION BY ect ORDER BY `end` DESC) rn
FROM (
SELECT origin, form, ect, bin.end, SUM(bin.density) OVER(PARTITION BY ect ORDER BY `end`) AS density
FROM (
SELECT origin, form_factor.name form, effective_connection_type.name ect, first_contentful_paint.histogram
FROM `chrome-ux-report.chrome_ux_report.201710`
WHERE origin = 'http://example.com' AND form_factor.name = 'phone'
) , UNNEST(histogram.bin) AS bin
)
WHERE density < 0.7
)
WHERE rn=1
with cte as (
<your query here>
)
select x.*
from (select cte.*, sum(density) over (order by density desc) as runningtotal
from cte
) x
where runningtotal - density < 0.7 and runningtotal >= 0.7;
#standardSQL
SELECT ect, `end`, density
FROM (
SELECT ect, `end`, density, ROW_NUMBER() OVER(PARTITION BY ect ORDER BY `end` DESC) rn
FROM (
SELECT ect, bin.end, SUM(bin.density) OVER(PARTITION BY ect ORDER BY `end`) AS density
FROM (
SELECT effective_connection_type.name ect, first_contentful_paint.histogram
FROM `chrome-ux-report.chrome_ux_report.201710`
WHERE origin = 'http://example.com'
) , UNNEST(histogram.bin) AS bin
)
WHERE density < 0.7
)
WHERE rn=1