Google bigquery BigQuery运行总计
我在BigQuery中运行总计时遇到问题 我在这里找到了一个有效的例子: 但我真正想做的是计算最流行的词的数量,这些词占总词数的80%。因此,我尝试在按word_count排序时首先计算运行总数:Google bigquery BigQuery运行总计,google-bigquery,window-functions,cumulative-sum,Google Bigquery,Window Functions,Cumulative Sum,我在BigQuery中运行总计时遇到问题 我在这里找到了一个有效的例子: 但我真正想做的是计算最流行的词的数量,这些词占总词数的80%。因此,我尝试在按word_count排序时首先计算运行总数: SELECT word, word_count, SUM(word_count) OVER(ORDER BY word_count DESC) FROM [publicdata:samples.shakespeare] WHERE corpus = 'hamlet' AND word > '
SELECT word, word_count, SUM(word_count) OVER(ORDER BY word_count DESC)
FROM [publicdata:samples.shakespeare]
WHERE corpus = 'hamlet'
AND word > 'a' LIMIT 30
但我明白了:
Row word word_count f0_
1 o'er 18 18
2 answer 13 31
3 meet 8 39
4 told 5 44
5 treason 4 **52**
6 quality 4 **52**
7 brave 3 55
运行总数没有从5号线增加到6号线。可能是因为在这两种情况下,单词数都是4
我做错了什么
也许有更好的办法?我的计划是计算总跑步量。然后将其除以sum(word_count)OVER()并仅过滤小于80%的行。然后计算这些行的数量 首先,删除“LIMIT 30”-它将干扰OVER()子句
你想要一个比例?尝试比率报告:
SELECT word, word_count, RATIO_TO_REPORT(word_count) OVER(ORDER BY word_count DESC)
FROM [publicdata:samples.shakespeare]
WHERE corpus = 'hamlet'
AND word > 'a'
是否希望具有相同值的连续行以任何方式增加?确定这些行的顺序,并使用次顺序:
SELECT word, word_count, RATIO_TO_REPORT(word_count) OVER(ORDER BY word_count DESC, word)
FROM [publicdata:samples.shakespeare]
WHERE corpus = 'hamlet'
AND word > 'a'
你想要涵盖80%的最流行词汇吗?将这些比率相加,过滤掉剩下的:
SELECT word, word_count, sum_ratio
FROM (
SELECT word, word_count, SUM(ratio) OVER(ORDER BY ratio, word) sum_ratio
FROM (
SELECT word, word_count, RATIO_TO_REPORT(word_count) OVER(ORDER BY word_count DESC, word) ratio
FROM [publicdata:samples.shakespeare]
WHERE corpus = 'hamlet'
AND word > 'a'
)
)
WHERE sum_ratio>0.8
Row word word_count sum_ratio
1 is 313 0.8125175752219499
2 it 361 0.827019644076648
3 in 400 0.8430884184308841
4 my 441 0.8608042421564295
5 you 499 0.8808500381633391
6 of 630 0.906158357771261
7 to 635 0.9316675370586108
8 and 706 0.9600289237938375
9 the 995 0.9999999999999999
谢谢!这是一门真正的关于窗口函数的课程。
SELECT word, word_count, sum_ratio
FROM (
SELECT word, word_count, SUM(ratio) OVER(ORDER BY ratio, word) sum_ratio
FROM (
SELECT word, word_count, RATIO_TO_REPORT(word_count) OVER(ORDER BY word_count DESC, word) ratio
FROM [publicdata:samples.shakespeare]
WHERE corpus = 'hamlet'
AND word > 'a'
)
)
WHERE sum_ratio>0.8
Row word word_count sum_ratio
1 is 313 0.8125175752219499
2 it 361 0.827019644076648
3 in 400 0.8430884184308841
4 my 441 0.8608042421564295
5 you 499 0.8808500381633391
6 of 630 0.906158357771261
7 to 635 0.9316675370586108
8 and 706 0.9600289237938375
9 the 995 0.9999999999999999