Google bigquery 将使用记录与BigQuery中相应的使用计划相关联

Google bigquery 将使用记录与BigQuery中相应的使用计划相关联,google-bigquery,Google Bigquery,客户的资源使用情况: +-------+-------------+-----------------------+ | usage | customer_id | timestamp | +-------+-------------+-----------------------+ | 10 | 1 | 2019-01-12T01:00:00 | | 16 | 1 | 2019-02-12T02:00:00 |

客户的资源使用情况:

+-------+-------------+-----------------------+
| usage | customer_id |  timestamp            |
+-------+-------------+-----------------------+
| 10    | 1           |  2019-01-12T01:00:00  |
| 16    | 1           |  2019-02-12T02:00:00  |
| 26    | 1           |  2019-03-12T03:00:00  |
| 24    | 1           |  2019-04-12T04:00:00  |
| 4     | 1           |  2019-05-15T01:00:00  |
+-------+-------------+-----------------------+
此表显示每个客户每小时报告的使用情况。分和秒总是零

客户的计划变更日志:

+--------+-------------+-----------------------+
| plan   | customer_id |  timestamp            |
+--------+-------------+-----------------------+
| A      | 1           |  2018-12-12T01:24:00  |
| B      | 1           |  2019-01-12T02:31:00  |
| C      | 1           |  2019-03-12T03:53:00  |
+--------+-------------+-----------------------+
当客户更改其使用计划时,操作将存储在更改日志中

结果:将每个使用记录与使用计划关联

+-------+-------------+--------+-----------------------+
| usage | customer_id |  plan  |  timestamp            |
+-------+-------------+--------+-----------------------+
| 10    | 1           |  A     |  2019-01-05T01:00:00  |
| 16    | 1           |  B     |  2019-02-12T02:00:00  |
| 26    | 1           |  C     |  2019-03-10T03:00:00  |
| 24    | 1           |  C     |  2019-04-12T04:00:00  |
| 4     | 1           |  C     |  2019-05-15T01:00:00  |
+-------+-------------+--------+-----------------------+
我尝试的内容:为了确定特定使用记录的计划,我获取该记录的时间戳,并在使用计划日志中查找最新的计划更改记录:

SELECT
  customer_id,
  plan,
  timestamp,
  ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY timestamp DESC) seqnum
FROM
  `project.dataset.table`
WHERE seqnum = 1 AND timestamp <= timestamp_of_the_usage_record
但是,我不确定如何将其与用法表结合起来。我试过:

WITH log AS (
  SELECT
      customer_id,
      plan,
      timestamp,
      ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY timestamp DESC) seqnum
    FROM
      `project.dataset.plan_change_log`
)
SELECT
  t1.customer_id,
  log.plan,
  t1.usage,
  t1.timestamp
FROM
  `project.dataset.usage` t1
FULL JOIN log
ON log.customer_id = t1.customer_id AND log.timestamp <= t1.timestamp AND seqnum = 1

由于联接条件,结果表的行数少于原始使用率表。但是,行的数量应该保持不变。有什么办法解决这个问题吗?

虽然示例中的数据与最终结果的第一行和第三行有点不符,但您的思路是正确的

with data as (
SELECT
  t1.customer_id,
  log.plan,
  t1.usage,
  t1.timestamp,
  log.timestamp as logt,
  ROW_NUMBER() OVER (PARTITION BY t1.customer_id, t1.timestamp  ORDER BY  log.timestamp DESC) seqnum
FROM
  resource t1
FULL JOIN log
ON log.customer_id = t1.customer_id AND log.timestamp <= t1.timestamp 
)
select * from data where seqnum = 1

您希望在联接的结果上创建序列,而不是在之前。

虽然示例中的数据与最终结果的第一行和第三行有点不符,但您的思路是正确的

with data as (
SELECT
  t1.customer_id,
  log.plan,
  t1.usage,
  t1.timestamp,
  log.timestamp as logt,
  ROW_NUMBER() OVER (PARTITION BY t1.customer_id, t1.timestamp  ORDER BY  log.timestamp DESC) seqnum
FROM
  resource t1
FULL JOIN log
ON log.customer_id = t1.customer_id AND log.timestamp <= t1.timestamp 
)
select * from data where seqnum = 1
您希望在连接的结果上创建序列,而不是之前