Google bigquery 如果记录重复过多,则进行大查询(带展平)

Google bigquery 如果记录重复过多,则进行大查询(带展平),google-bigquery,google-cloud-platform,Google Bigquery,Google Cloud Platform,这是关于以下问题的解决方案:我尝试创建了一个测试表并尝试了给定的查询,但它实际上并没有选择纽约和芝加哥的居民。测试数据如下: {"fullname": "John Smith", "citiesLived": [{"place": "newyork"}, {"place": "chicago"}, {"place": "seattle"}]} {"fullname": "Adam Smith", "citiesLived": [{"place": "newyork"}, {"place": "c

这是关于以下问题的解决方案:我尝试创建了一个测试表并尝试了给定的查询,但它实际上并没有选择纽约和芝加哥的居民。测试数据如下:

{"fullname": "John Smith", "citiesLived": [{"place": "newyork"}, {"place": "chicago"}, {"place": "seattle"}]}
{"fullname": "Adam Smith", "citiesLived": [{"place": "newyork"}, {"place": "chicago"}, {"place": "phil"}]}
{"fullname": "Adam Jefferson", "citiesLived": [{"place": "boston"}, {"place": "chicago"}, {"place": "seattle"}]}
SELECT
  *
FROM (
  SELECT
    fullname,
    IF (citiesLived.place == 'newyork', 1, 0) AS ny,
    IF (citiesLived.place == 'chicago', 1, 0) AS chi
  FROM (FLATTEN(tester.citiesLived, citiesLived))
  OMIT
    RECORD IF citiesLived.place = 'seattle')
WHERE
  ny == 1
  AND chi == 1
查询如下:

{"fullname": "John Smith", "citiesLived": [{"place": "newyork"}, {"place": "chicago"}, {"place": "seattle"}]}
{"fullname": "Adam Smith", "citiesLived": [{"place": "newyork"}, {"place": "chicago"}, {"place": "phil"}]}
{"fullname": "Adam Jefferson", "citiesLived": [{"place": "boston"}, {"place": "chicago"}, {"place": "seattle"}]}
SELECT
  *
FROM (
  SELECT
    fullname,
    IF (citiesLived.place == 'newyork', 1, 0) AS ny,
    IF (citiesLived.place == 'chicago', 1, 0) AS chi
  FROM (FLATTEN(tester.citiesLived, citiesLived))
  OMIT
    RECORD IF citiesLived.place = 'seattle')
WHERE
  ny == 1
  AND chi == 1

您不需要进行展平(一般来说,在BigQuery查询中很少需要展平),只要忽略以下内容即可:

SELECT fullname FROM tester.citiesLived
OMIT RECORD IF NOT (
  SOME(citiesLived.place = "newyork") AND
  SOME(citiesLived.place = "chicago"))

省略IF的条件表示,如果一些居住的城市是纽约,一些是芝加哥,那么它符合您的标准。但是如果两者都不正确,则应省略记录(因此not谓词)。

我认为这将是对原始预期查询的更完整重写:

SELECT
  *
FROM (
  SELECT
    fullname,
    SOME(citiesLived.place == 'newyork') WITHIN RECORD AS ny,
    SOME(citiesLived.place == 'chicago') WITHIN RECORD AS chi
  FROM tester.citiesLived
  OMIT
    RECORD IF SOME(citiesLived.place = 'seattle'))
WHERE
  ny == true
  AND chi == true