Google bigquery 将使用记录与BigQuery中相应的使用计划相关联
客户的资源使用情况:Google bigquery 将使用记录与BigQuery中相应的使用计划相关联,google-bigquery,Google Bigquery,客户的资源使用情况: +-------+-------------+-----------------------+ | usage | customer_id | timestamp | +-------+-------------+-----------------------+ | 10 | 1 | 2019-01-12T01:00:00 | | 16 | 1 | 2019-02-12T02:00:00 |
+-------+-------------+-----------------------+
| usage | customer_id | timestamp |
+-------+-------------+-----------------------+
| 10 | 1 | 2019-01-12T01:00:00 |
| 16 | 1 | 2019-02-12T02:00:00 |
| 26 | 1 | 2019-03-12T03:00:00 |
| 24 | 1 | 2019-04-12T04:00:00 |
| 4 | 1 | 2019-05-15T01:00:00 |
+-------+-------------+-----------------------+
此表显示每个客户每小时报告的使用情况。分和秒总是零
客户的计划变更日志:
+--------+-------------+-----------------------+
| plan | customer_id | timestamp |
+--------+-------------+-----------------------+
| A | 1 | 2018-12-12T01:24:00 |
| B | 1 | 2019-01-12T02:31:00 |
| C | 1 | 2019-03-12T03:53:00 |
+--------+-------------+-----------------------+
当客户更改其使用计划时,操作将存储在更改日志中
结果:将每个使用记录与使用计划关联
+-------+-------------+--------+-----------------------+
| usage | customer_id | plan | timestamp |
+-------+-------------+--------+-----------------------+
| 10 | 1 | A | 2019-01-05T01:00:00 |
| 16 | 1 | B | 2019-02-12T02:00:00 |
| 26 | 1 | C | 2019-03-10T03:00:00 |
| 24 | 1 | C | 2019-04-12T04:00:00 |
| 4 | 1 | C | 2019-05-15T01:00:00 |
+-------+-------------+--------+-----------------------+
我尝试的内容:为了确定特定使用记录的计划,我获取该记录的时间戳,并在使用计划日志中查找最新的计划更改记录:
SELECT
customer_id,
plan,
timestamp,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY timestamp DESC) seqnum
FROM
`project.dataset.table`
WHERE seqnum = 1 AND timestamp <= timestamp_of_the_usage_record
但是,我不确定如何将其与用法表结合起来。我试过:
WITH log AS (
SELECT
customer_id,
plan,
timestamp,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY timestamp DESC) seqnum
FROM
`project.dataset.plan_change_log`
)
SELECT
t1.customer_id,
log.plan,
t1.usage,
t1.timestamp
FROM
`project.dataset.usage` t1
FULL JOIN log
ON log.customer_id = t1.customer_id AND log.timestamp <= t1.timestamp AND seqnum = 1
由于联接条件,结果表的行数少于原始使用率表。但是,行的数量应该保持不变。有什么办法解决这个问题吗?虽然示例中的数据与最终结果的第一行和第三行有点不符,但您的思路是正确的
with data as (
SELECT
t1.customer_id,
log.plan,
t1.usage,
t1.timestamp,
log.timestamp as logt,
ROW_NUMBER() OVER (PARTITION BY t1.customer_id, t1.timestamp ORDER BY log.timestamp DESC) seqnum
FROM
resource t1
FULL JOIN log
ON log.customer_id = t1.customer_id AND log.timestamp <= t1.timestamp
)
select * from data where seqnum = 1
您希望在联接的结果上创建序列,而不是在之前。虽然示例中的数据与最终结果的第一行和第三行有点不符,但您的思路是正确的
with data as (
SELECT
t1.customer_id,
log.plan,
t1.usage,
t1.timestamp,
log.timestamp as logt,
ROW_NUMBER() OVER (PARTITION BY t1.customer_id, t1.timestamp ORDER BY log.timestamp DESC) seqnum
FROM
resource t1
FULL JOIN log
ON log.customer_id = t1.customer_id AND log.timestamp <= t1.timestamp
)
select * from data where seqnum = 1
您希望在连接的结果上创建序列,而不是之前