Google bigquery 对具有表联接的查询使用CORR()函数

Google bigquery 对具有表联接的查询使用CORR()函数,google-bigquery,Google Bigquery,在表联接查询上使用CORR()函数时,我得到一个空值。但是,对于没有联接的查询,CORR()函数将返回一个值。我得到其他字段的值。我已经尝试过为字段指定别名,或者没有别名,但我似乎无法在查询2中获得相关值 提前谢谢 查询1 返回相关性的值。下面的查询和结果json链接 select DATE(Time ) as date, ROUND(AVG(Price),2) as price, ROUND(SUM(amount),2) as volume, CORR(price, amount) as c

在表联接查询上使用CORR()函数时,我得到一个空值。但是,对于没有联接的查询,CORR()函数将返回一个值。我得到其他字段的值。我已经尝试过为字段指定别名,或者没有别名,但我似乎无法在查询2中获得相关值

提前谢谢


查询1 返回相关性的值。下面的查询和结果json链接

select DATE(Time ) as date, ROUND(AVG(Price),2) as price, ROUND(SUM(amount),2) as volume, CORR(price, amount) as correlation

from

ds_5.tb_4981, ds_5.tb_4978, ds_5.tb_4967

where YEAR(Time) = 2014

group by date

order by date ASC
select bitcoin.date as date, bitcoin.btcprice, blockchain.trans_vol,  CORR(bitcoin.btcprice,blockchain.trans_vol) as correlation

from

(select DATE(time) as date, AVG(price) as btcprice
from

ds_5.tb_4981, ds_5.tb_4978, ds_5.tb_4967

where YEAR(Time) = 2014

group by date) as bitcoin

JOIN
(select
DATE(blocktime) as date, SUM(vout.value) as trans_vol
from ds_14.tb_7917, ds_14.tb_7918, ds_14.tb_7919, ds_14.tb_7920, ds_14.tb_7921, ds_14.tb_7922, ds_14.tb_7923, ds_14.tb_7924, ds_14.tb_7925, ds_14.tb_7926, ds_14.tb_7927, ds_14.tb_7928, ds_14.tb_7934, ds_14.tb_7972, ds_14.tb_8016, ds_14.tb_8086, ds_14.tb_9743, ds_14.tb_9888, ds_14.tb_10084, ds_14.tb_10136, ds_14.tb_10500, ds_14.tb_10601
where YEAR(blocktime) = 2014
group by Date) as blockchain

on bitcoin.date = blockchain.date

group each by date, bitcoin.btcprice, blockchain.trans_vol

order by date ASC
查询1结果json:


查询2 相关性的空值。下面的查询和结果json链接

select DATE(Time ) as date, ROUND(AVG(Price),2) as price, ROUND(SUM(amount),2) as volume, CORR(price, amount) as correlation

from

ds_5.tb_4981, ds_5.tb_4978, ds_5.tb_4967

where YEAR(Time) = 2014

group by date

order by date ASC
select bitcoin.date as date, bitcoin.btcprice, blockchain.trans_vol,  CORR(bitcoin.btcprice,blockchain.trans_vol) as correlation

from

(select DATE(time) as date, AVG(price) as btcprice
from

ds_5.tb_4981, ds_5.tb_4978, ds_5.tb_4967

where YEAR(Time) = 2014

group by date) as bitcoin

JOIN
(select
DATE(blocktime) as date, SUM(vout.value) as trans_vol
from ds_14.tb_7917, ds_14.tb_7918, ds_14.tb_7919, ds_14.tb_7920, ds_14.tb_7921, ds_14.tb_7922, ds_14.tb_7923, ds_14.tb_7924, ds_14.tb_7925, ds_14.tb_7926, ds_14.tb_7927, ds_14.tb_7928, ds_14.tb_7934, ds_14.tb_7972, ds_14.tb_8016, ds_14.tb_8086, ds_14.tb_9743, ds_14.tb_9888, ds_14.tb_10084, ds_14.tb_10136, ds_14.tb_10500, ds_14.tb_10601
where YEAR(blocktime) = 2014
group by Date) as blockchain

on bitcoin.date = blockchain.date

group each by date, bitcoin.btcprice, blockchain.trans_vol

order by date ASC

查询2结果json:

我获取了您链接的CSV并将其留在这里:

(我不知道为什么您更喜欢通过文件共享csv,而不是在BigQuery中创建公共数据集并共享链接)

所以这是可行的:

SELECT CORR(btc_price, trans_vol)
FROM [fh-bigquery:public_dump.datadivescsv] 

-0.004957046970769512   
但这并不是:

SELECT CORR(btc_price, trans_vol)
FROM [fh-bigquery:public_dump.datadivescsv]
GROUP BY date

null
null
...
null
这是意料之中的

原因:要计算相关性,我们需要两个以上的数字集。在第二个查询中,按日期分组会给我们留下1个元素的n个组,因此相关性是不可计算的

(旁注:2个元素之间的相关性始终为1或-1。我们确实需要至少3个元素,而且更多才能使结果显著)


。。。诸如此类

您可以在公共数据集上编写相同的查询,还是打开您的查询?如果能够运行有问题的查询,那么对任何试图提供帮助的人来说都会容易得多。是的,抱歉,Felipe。我想知道是不是我的语法和/或功能不受支持。如果可以的话,我和你分享了数据集。将来我会用公共数据集来回答问题。我还在旅行。你能公开一个样本让其他人回答这个问题吗。一个简单的问题:您确定join查询返回行吗?也就是说,如果删除CORR并重试外部查询,它是否返回行?您还可以验证正在关联的两个字段是否都具有非空值吗?啊哈。我懂了。谢谢你指出这一点!很抱歉没有通过BQ分享,我没有想到。分享并让其他人也看到这个例子是有意义的。再次感谢菲利佩:)