Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/70.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Sql 避免多个子查询的BigQuery_Sql_Google Bigquery - Fatal编程技术网

Sql 避免多个子查询的BigQuery

Sql 避免多个子查询的BigQuery,sql,google-bigquery,Sql,Google Bigquery,我们正在开发一个应用程序,它将请求存储在一个表中,响应存储在另一个表中(当然)。每个请求可以有多个响应,并将请求ID存储在两个表中 最初,我认为我们可以使用请求->响应的左连接来计算每个匹配条件的总数: SELECT source, COUNT(*) as requests, COUNT(responses.request_id) as responses FROM DATASET.requests LEFT JOIN DATASET.responses ON requests.id = res

我们正在开发一个应用程序,它将请求存储在一个表中,响应存储在另一个表中(当然)。每个请求可以有多个响应,并将请求ID存储在两个表中

最初,我认为我们可以使用请求->响应的左连接来计算每个匹配条件的总数:

SELECT source, COUNT(*) as requests, COUNT(responses.request_id) as responses
FROM DATASET.requests
LEFT JOIN DATASET.responses ON requests.id = responses.request_id
WHERE source = "source1"
GROUP BY source
有70个请求符合WHERE标准,30个响应符合此标准。预期输出为:“source1、70、30”。 此后,我学习了更多关于连接行为的知识,取而代之的是“source1259207”。两边都有重复的ID

我能得到我想要的结果的唯一方法就是创建一个巨大的查询,以及多个完整的子查询,这些子查询在ID集中匹配,并根据给定的条件进行过滤。然后使用过滤的ID集来真正提取我们的字段、统计数据等

SELECT * FROM
  (SELECT COUNT(*) as responses FROM DATASET.responses
  WHERE id IN (SELECT id FROM DATASET.requests WHERE source = 
  "source1"))
  ,
 (SELECT source, COUNT(*) as requests
  FROM  PUBDATA.requests
  WHERE id IN (SELECT id FROM DATASET.requests WHERE source = "source1")
  GROUP BY source)
这看起来很糟糕。我曾尝试使用CTE收集我们想要的id列表,并在(CTE.id)中使用WHERE id/request_id,但这显然是不可能的,除非我们在CTE上加入,这会再次产生错误和成倍的结果

由于我们希望在查询中添加额外的统计信息,这将需要进一步的WHERE子句,我担心这个怪物将继续增长,并且很难实现

如果有更好的方法,请告诉我。谢谢

根据请求编辑示例模式 请求

id (String), source (String), partner_ids (Integer array), user_agent (String), timestamp (Timestamp), ...
request_id (String, from requests.id), partner_id (Integer), is_billed (boolean), price_charged (float, null if is_billed = false), response_categories (String array, not from requests), ...
响应

id (String), source (String), partner_ids (Integer array), user_agent (String), timestamp (Timestamp), ...
request_id (String, from requests.id), partner_id (Integer), is_billed (boolean), price_charged (float, null if is_billed = false), response_categories (String array, not from requests), ...

挑战在于,我们必须主要查询Requests表以获得与我们的条件匹配的ID值列表,然后为一个合并报告查询每个表上的统计数据(例如计数、在何处计费等)。我们可能还需要从每个表上的条件中提取ID池(例如,where requests.source='source1'和responses.response\u“action”中的类别)

我认为您可以使用
union all
group by
来做您想做的事情:

select source, sum(requests) as requests, sum(responses) as responses
from ((select source, count(*) as requests, 0 as response
       from dataset.requests
       group by source
      ) union all
      (select source, 0 as requests, count(*) as responses
       from dataset.responses
       group by source
      )
     ) rr
group by source;
这将对所有源进行计算

编辑:

对于修订版,只需使用附加的
连接

select source, sum(requests) as requests, sum(responses) as responses
from ((select source, count(*) as requests, 0 as response
       from dataset.requests rq
       group by rq.source
      ) union all
      (select rq.source, 0 as requests, count(*) as responses
       from dataset.responses r join
            (select distinct rq.id
             from dataset.requests rq
            ) rq
            on r.id = rq.id
       group by rq.source
      )
     ) rr
group by source;
如果每个请求最多有一个响应,您可以将其缩短为:

select rq.source, count(*) as requests, count(r.id) as responses
from dataset.requests rq left join
     dataset.responses r
     on r.id = rq.id
group by rq.source

也许我有点误会,你们为什么不数一数,在id上加入呢

WITH
    sources
    AS
        (  SELECT COUNT (*) source_cnt, id
             FROM dataset.request
         GROUP BY id),
    responses
    AS
        (  SELECT COUNT (*) AS response_cnt, id
             FROM dataset.responses
         GROUP BY id)
SELECT source_cnt, response_cnt, sources.id
  FROM sources INNER JOIN responses ON sources.id = responses.id;
如果要保留所有记录,可以将其修改为完全外部联接:

WITH
    sources
    AS
        (  SELECT COUNT (*) source_cnt, id
             FROM dataset.request
         GROUP BY id),
    responses
    AS
        (  SELECT COUNT (*) AS response_cnt, id
             FROM dataset.responses
         GROUP BY id)
SELECT COALESCE (sources.id, responses.id) AS id, source_cnt, response_cnt
  FROM sources FULL OUTER JOIN responses ON sources.id = responses.id

老实说,我对你最后想要看到的有点困惑,我也不完全理解你为什么有70个请求,只有30个响应,如果一个请求可以有多个响应。您的意思是某些请求可以有0个响应吗?或者你在计算不同的反应

如果您希望计算请求的总数以及与这些特定请求相关的响应的总数,我相信对代码的这一细微修改应该会起作用:

SELECT source, COUNT(DISTINCT id) as requests, COUNT(responses.request_id) as responses
FROM `dataset.requests` as requests
LEFT JOIN `dataset.responses` as responses ON requests.id = responses.request_id
WHERE source = "source1"
GROUP BY source

样本数据和期望的结果将有所帮助。谢谢你,但我也不认为这会起作用。我补充了一点说明,源字段只存在于requests表中,ID列是它们之间唯一的公共交集。所以它仍然在乘以响应计数,应该是30。@EvanTestvoid。你应该用样本数据来澄清这个问题。你不是在寻找回复的数量。您正在查找带有响应的请求数。这是完全不同的计算,这是不正确的。我们需要一个独立的请求和响应计数,这些请求和响应在预筛选的ID集中具有ID值,然后在任一表中按ID及其相关列值排序。要重新迭代,请求中总共有70个条目,响应中有30个条目。这是每一张桌子的大小。