Google bigquery bigquery中两个表的阈值总和聚合
下表显示了设备每小时的能源使用情况:Google bigquery bigquery中两个表的阈值总和聚合,google-bigquery,Google Bigquery,下表显示了设备每小时的能源使用情况: +--------------+-----------+-----------------------+ | energy_usage | device_id | timestamp | +--------------+-----------+-----------------------+ | 10 | 1 | 2019-02-12T01:00:00 | | 16 | 2
+--------------+-----------+-----------------------+
| energy_usage | device_id | timestamp |
+--------------+-----------+-----------------------+
| 10 | 1 | 2019-02-12T01:00:00 |
| 16 | 2 | 2019-02-12T01:00:00 |
| 26 | 1 | 2019-03-12T02:00:00 |
| 24 | 2 | 2019-03-12T02:00:00 |
+--------------+-----------+-----------------------+
我汇总了这些数据,这样我就可以得到白天和夜间的能源使用情况以及设备:
+--------------+------------------+--------------------+-----------+------------+
| energy_usage | energy_usage_day | energy_usage_night | device_id | date |
+--------------+------------------+--------------------+-----------+------------+
| 80 | 30 | 50 | 1 | 2019-06-02 |
| 130 | 60 | 70 | 2 | 2019-06-03 |
+--------------+------------------+--------------------+-----------+------------+
我只对超过某个阈值的能源使用感兴趣。以下查询适用于我:
WITH temp AS (
SELECT *, SUM(usage) OVER(win) > 50 qualified,
SUM(usage) OVER(win) - 50 rolling_sum,
EXTRACT(HOUR FROM timestamp) BETWEEN 8 AND 19 day_hour,
EXTRACT(MONTH FROM timestamp) month,
FORMAT_TIMESTAMP("%Y-%m-%d", timestamp) date
FROM `project.dataset.table`
WINDOW win AS (PARTITION BY device_id, TIMESTAMP_TRUNC(timestamp, MONTH) ORDER BY timestamp)
), temp_with_adjustments AS (
SELECT *,
IF(
ROW_NUMBER() OVER(PARTITION BY device_id, MONTH ORDER BY timestamp) = 1,
rolling_sum,
usage
) AS adjusted_energy_usage
FROM temp
WHERE qualified
)
SELECT ROUND(SUM(adjusted_energy_usage), 4) energy_usage,
ROUND(SUM(IF(day_hour, adjusted_energy_usage, 0)), 4) energy_usage_day,
ROUND(SUM(IF(NOT day_hour, adjusted_energy_usage, 0)), 4) energy_usage_night,
device_id,
date
FROM temp_with_adjustments
GROUP BY device_id, date
虽然第一个表显示了能源使用情况,但我还有另一个表显示了相应的使用计费:
+--------------+-----------+-----------------------+
| usage_charge | device_id | timestamp |
+--------------+-----------+-----------------------+
| 0.2 | 1 | 2019-02-12T01:00:00 |
| 0.6 | 2 | 2019-02-12T01:00:00 |
| 0.1 | 1 | 2019-03-12T02:00:00 |
| 1.2 | 2 | 2019-03-12T02:00:00 |
+--------------+-----------+-----------------------+
我想了解能量使用量>50的设备在白天和晚上的使用费,按设备和日期分列。结果可能如下所示:
+--------------+------------------+--------------------+--------------+------------------+--------------------+-----------+------------+
| energy_usage | energy_usage_day | energy_usage_night | usage_charge | usage_charge_day | usage_charge_night | device_id | date |
+--------------+------------------+--------------------+--------------+------------------+--------------------+-----------+------------+
| 80 | 30 | 50 | 1.2 | 0.4 | 0.8 | 1 | 2019-06-02 |
| 130 | 60 | 70 | 2.5 | 1 | 1.5 | 2 | 2019-06-03 |
+--------------+------------------+--------------------+--------------+------------------+--------------------+-----------+------------+
因此,我的第一个想法是使用与能源使用完全相同的查询来查询使用费。然而,虽然50的阈值适用于能源使用,但我无法为使用费指定一个固定阈值,因为费用计算因设备而异。因此,我必须首先获得能量使用>50,然后使用时间戳来聚合使用费。有没有办法在bigquery中实现这一点?甚至可能吗?下面是针对BigQuery标准SQL的,只是基于我在初始查询中看到的应用模式,所以我很难100%确定它正是您所需要的。但无论如何,这肯定是一个良好的开端
#standardSQL
WITH temp AS (
SELECT *, SUM(IF(qualified, usage_charge, 0)) OVER(win) rolling_charge
FROM (
SELECT *, SUM(usage) OVER(win) > 50 qualified,
SUM(usage) OVER(win) - 50 rolling_sum,
EXTRACT(HOUR FROM timestamp) BETWEEN 8 AND 19 day_hour,
EXTRACT(MONTH FROM timestamp) month,
FORMAT_TIMESTAMP("%Y-%m-%d", timestamp) date
FROM `project.dataset.usage`
JOIN `project.dataset.charges` USING(device_id, timestamp)
WINDOW win AS (PARTITION BY device_id, TIMESTAMP_TRUNC(timestamp, MONTH) ORDER BY timestamp)
)
WINDOW win AS (PARTITION BY device_id, TIMESTAMP_TRUNC(timestamp, MONTH) ORDER BY timestamp)
), temp_with_adjustments AS (
SELECT *,
IF(
ROW_NUMBER() OVER(PARTITION BY device_id, MONTH ORDER BY timestamp) = 1,
rolling_sum,
usage
) AS adjusted_energy_usage
FROM temp
WHERE qualified
)
SELECT ROUND(SUM(adjusted_energy_usage), 4) energy_usage,
ROUND(SUM(IF(day_hour, adjusted_energy_usage, 0)), 4) energy_usage_day,
ROUND(SUM(IF(NOT day_hour, adjusted_energy_usage, 0)), 4) energy_usage_night,
ROUND(SUM(rolling_charge), 4) usage_charge,
ROUND(SUM(IF(day_hour, rolling_charge, 0)), 4) usage_charge_day,
ROUND(SUM(IF(NOT day_hour, rolling_charge, 0)), 4) usage_charge_night,
device_id,
date
FROM temp_with_adjustments
GROUP BY device_id, date