Google bigquery 对具有表联接的查询使用CORR()函数
在表联接查询上使用CORR()函数时,我得到一个空值。但是,对于没有联接的查询,CORR()函数将返回一个值。我得到其他字段的值。我已经尝试过为字段指定别名,或者没有别名,但我似乎无法在查询2中获得相关值 提前谢谢Google bigquery 对具有表联接的查询使用CORR()函数,google-bigquery,Google Bigquery,在表联接查询上使用CORR()函数时,我得到一个空值。但是,对于没有联接的查询,CORR()函数将返回一个值。我得到其他字段的值。我已经尝试过为字段指定别名,或者没有别名,但我似乎无法在查询2中获得相关值 提前谢谢 查询1 返回相关性的值。下面的查询和结果json链接 select DATE(Time ) as date, ROUND(AVG(Price),2) as price, ROUND(SUM(amount),2) as volume, CORR(price, amount) as c
查询1 返回相关性的值。下面的查询和结果json链接
select DATE(Time ) as date, ROUND(AVG(Price),2) as price, ROUND(SUM(amount),2) as volume, CORR(price, amount) as correlation
from
ds_5.tb_4981, ds_5.tb_4978, ds_5.tb_4967
where YEAR(Time) = 2014
group by date
order by date ASC
select bitcoin.date as date, bitcoin.btcprice, blockchain.trans_vol, CORR(bitcoin.btcprice,blockchain.trans_vol) as correlation
from
(select DATE(time) as date, AVG(price) as btcprice
from
ds_5.tb_4981, ds_5.tb_4978, ds_5.tb_4967
where YEAR(Time) = 2014
group by date) as bitcoin
JOIN
(select
DATE(blocktime) as date, SUM(vout.value) as trans_vol
from ds_14.tb_7917, ds_14.tb_7918, ds_14.tb_7919, ds_14.tb_7920, ds_14.tb_7921, ds_14.tb_7922, ds_14.tb_7923, ds_14.tb_7924, ds_14.tb_7925, ds_14.tb_7926, ds_14.tb_7927, ds_14.tb_7928, ds_14.tb_7934, ds_14.tb_7972, ds_14.tb_8016, ds_14.tb_8086, ds_14.tb_9743, ds_14.tb_9888, ds_14.tb_10084, ds_14.tb_10136, ds_14.tb_10500, ds_14.tb_10601
where YEAR(blocktime) = 2014
group by Date) as blockchain
on bitcoin.date = blockchain.date
group each by date, bitcoin.btcprice, blockchain.trans_vol
order by date ASC
查询1结果json:
查询2 相关性的空值。下面的查询和结果json链接
select DATE(Time ) as date, ROUND(AVG(Price),2) as price, ROUND(SUM(amount),2) as volume, CORR(price, amount) as correlation
from
ds_5.tb_4981, ds_5.tb_4978, ds_5.tb_4967
where YEAR(Time) = 2014
group by date
order by date ASC
select bitcoin.date as date, bitcoin.btcprice, blockchain.trans_vol, CORR(bitcoin.btcprice,blockchain.trans_vol) as correlation
from
(select DATE(time) as date, AVG(price) as btcprice
from
ds_5.tb_4981, ds_5.tb_4978, ds_5.tb_4967
where YEAR(Time) = 2014
group by date) as bitcoin
JOIN
(select
DATE(blocktime) as date, SUM(vout.value) as trans_vol
from ds_14.tb_7917, ds_14.tb_7918, ds_14.tb_7919, ds_14.tb_7920, ds_14.tb_7921, ds_14.tb_7922, ds_14.tb_7923, ds_14.tb_7924, ds_14.tb_7925, ds_14.tb_7926, ds_14.tb_7927, ds_14.tb_7928, ds_14.tb_7934, ds_14.tb_7972, ds_14.tb_8016, ds_14.tb_8086, ds_14.tb_9743, ds_14.tb_9888, ds_14.tb_10084, ds_14.tb_10136, ds_14.tb_10500, ds_14.tb_10601
where YEAR(blocktime) = 2014
group by Date) as blockchain
on bitcoin.date = blockchain.date
group each by date, bitcoin.btcprice, blockchain.trans_vol
order by date ASC
查询2结果json:我获取了您链接的CSV并将其留在这里: (我不知道为什么您更喜欢通过文件共享csv,而不是在BigQuery中创建公共数据集并共享链接) 所以这是可行的:
SELECT CORR(btc_price, trans_vol)
FROM [fh-bigquery:public_dump.datadivescsv]
-0.004957046970769512
但这并不是:
SELECT CORR(btc_price, trans_vol)
FROM [fh-bigquery:public_dump.datadivescsv]
GROUP BY date
null
null
...
null
这是意料之中的
原因:要计算相关性,我们需要两个以上的数字集。在第二个查询中,按日期分组会给我们留下1个元素的n个组,因此相关性是不可计算的
(旁注:2个元素之间的相关性始终为1或-1。我们确实需要至少3个元素,而且更多才能使结果显著)
。。。诸如此类您可以在公共数据集上编写相同的查询,还是打开您的查询?如果能够运行有问题的查询,那么对任何试图提供帮助的人来说都会容易得多。是的,抱歉,Felipe。我想知道是不是我的语法和/或功能不受支持。如果可以的话,我和你分享了数据集。将来我会用公共数据集来回答问题。我还在旅行。你能公开一个样本让其他人回答这个问题吗。一个简单的问题:您确定join查询返回行吗?也就是说,如果删除CORR并重试外部查询,它是否返回行?您还可以验证正在关联的两个字段是否都具有非空值吗?啊哈。我懂了。谢谢你指出这一点!很抱歉没有通过BQ分享,我没有想到。分享并让其他人也看到这个例子是有意义的。再次感谢菲利佩:)