Google bigquery 童车行为;“订购”;在BigQuery的数组_AGG函数中
我认为BigQuery中的Google bigquery 童车行为;“订购”;在BigQuery的数组_AGG函数中,google-bigquery,bigquery-standard-sql,array-agg,Google Bigquery,Bigquery Standard Sql,Array Agg,我认为BigQuery中的ARRAY\u AGG函数在ORDER BY的行为中似乎存在缺陷。下面是一些SQL语句来解释: #standardSQL WITH t1 AS ( SELECT * FROM UNNEST ( [ STRUCT(1 AS user_id, 1 AS team_id, "2018-07-17" AS date_str), ( 2, 1, "2018-07-17" ), ( 3, 1, "2018-07-17" ), ( 4,
ARRAY\u AGG
函数在ORDER BY
的行为中似乎存在缺陷。下面是一些SQL语句来解释:
#standardSQL
WITH t1 AS (
SELECT *
FROM UNNEST ( [
STRUCT(1 AS user_id, 1 AS team_id, "2018-07-17" AS date_str),
( 2, 1, "2018-07-17" ),
( 3, 1, "2018-07-17" ),
( 4, 1, "2018-07-17" ),
( 5, 1, "2018-07-17" ),
( 6, 1, "2018-07-17" ),
( 7, 1, "2018-07-17" ),
( 8, 2, "2018-07-17" ),
( 9, 2, "2018-07-17" ),
( 10, 2, "2018-07-17" ),
( 11, 2, "2018-07-17" ),
( 14, 3, "2018-07-17" ),
( 15, 3, "2018-07-17" ),
( 16, 3, "2018-07-17" ),
( 17, 3, "2018-07-17" ),
( 1, 1, "2018-07-18" ),
( 4, 1, "2018-07-18" ),
( 5, 1, "2018-07-18" ),
( 6, 1, "2018-07-18" ),
( 7, 1, "2018-07-18" ),
( 8, 2, "2018-07-18" ),
( 9, 2, "2018-07-18" ),
( 10, 2, "2018-07-18" ),
( 11, 2, "2018-07-18" ),
( 12, 2, "2018-07-18" ),
( 13, 2, "2018-07-18" ),
( 14, 3, "2018-07-18" ),
( 15, 3, "2018-07-18" ),
( 16, 3, "2018-07-18" ),
( 17, 3, "2018-07-18" ),
( 18, 3, "2018-07-18" ) ] ) )
SELECT
date_str,
ARRAY_AGG(teams ORDER BY users) AS a1,
ARRAY_AGG(users ORDER BY users) AS a2,
ARRAY_AGG(teams ORDER BY teams) AS a3,
ARRAY_AGG(users ORDER BY teams) AS a4,
ARRAY_AGG(STRUCT(teams, users) ORDER BY users) AS a5
FROM (
SELECT
date_str,
users,
COUNT(*) AS teams
FROM (
SELECT
date_str,
team_id,
COUNT(*) AS users
FROM t1
GROUP BY date_str, team_id
)
GROUP BY date_str, users
)
GROUP BY date_str
ORDER BY date_str;
此查询返回
+-----+------------+----+----+----+----+----------+----------+
| Row | date_str | a1 | a2 | a3 | a4 | a5.teams | a5.users |
+-----+------------+----+----+----+----+----------+----------+
| 1 | 2018-07-17 | 1 | 4 | 1 | 4 | 2 | 4 |
| | | 2 | 7 | 2 | 7 | 1 | 7 |
| 2 | 2018-07-18 | 1 | 5 | 1 | 5 | 2 | 5 |
| | | 2 | 6 | 2 | 6 | 1 | 6 |
+-----+------------+----+----+----+----+----------+----------+
但我期待的是
+-----+------------+----+----+----+----+----------+----------+
| Row | date_str | a1 | a2 | a3 | a4 | a5.teams | a5.users |
+-----+------------+----+----+----+----+----------+----------+
| 1 | 2018-07-17 | 2 | 4 | 1 | 7 | 2 | 4 |
| | | 1 | 7 | 2 | 4 | 1 | 7 |
| 2 | 2018-07-18 | 2 | 5 | 1 | 6 | 2 | 5 |
| | | 1 | 6 | 2 | 5 | 1 | 6 |
+-----+------------+----+----+----+----+----------+----------+
似乎ARRAY\u AGG
函数中的ORDER BY
子句工作不正常,因为a1
和a4
的顺序错误
此外,当我用COUNT(user\u id)
或COUNT(team\u id)
替换两个COUNT(*)
部分时,很难想象查询会完全按照预期工作,这意味着
SELECT
date_str,
ARRAY_AGG(teams ORDER BY users) AS a1,
ARRAY_AGG(users ORDER BY users) AS a2,
ARRAY_AGG(teams ORDER BY teams) AS a3,
ARRAY_AGG(users ORDER BY teams) AS a4,
ARRAY_AGG(STRUCT(teams, users) ORDER BY users) AS a5
FROM (
SELECT
date_str,
users,
COUNT(*) AS teams
FROM (
SELECT
date_str,
team_id,
COUNT(user_id) AS users
FROM t1
GROUP BY date_str, team_id
)
GROUP BY date_str, users
)
GROUP BY date_str
ORDER BY date_str;
或
据我所知,在这种情况下,这些查询必须返回与原始查询相同的结果。这让我很困惑。可能是虫子或是我误解了什么
一些补充资料 内部子查询
SELECT
date_str,
users,
COUNT(*) AS teams
FROM (
SELECT
date_str,
team_id,
COUNT(*) AS users
FROM t1
GROUP BY date_str, team_id
)
GROUP BY date_str, users
这种回归
+-----+------------+-------+-------+
| Row | date_str | users | teams |
+-----+------------+-------+-------+
| 1 | 2018-07-18 | 5 | 2 |
| 2 | 2018-07-17 | 7 | 1 |
| 3 | 2018-07-18 | 6 | 1 |
| 4 | 2018-07-17 | 4 | 2 |
+-----+------------+-------+-------+
因此,通过with子句直接创建此数据并运行相同的聚合查询
#standardSQL
With t2 AS (
SELECT *
FROM UNNEST ( [
STRUCT("2018-07-18" AS date_str, 5 AS users, 2 AS teams),
( "2018-07-17", 7, 1 ),
( "2018-07-18", 6, 1 ),
( "2018-07-17", 4, 2 ) ] )
)
SELECT
date_str,
ARRAY_AGG(teams ORDER BY users) AS a1,
ARRAY_AGG(users ORDER BY users) AS a2,
ARRAY_AGG(teams ORDER BY teams) AS a3,
ARRAY_AGG(users ORDER BY teams) AS a4,
ARRAY_AGG(STRUCT(teams, users) ORDER BY users) AS a5
FROM t2
GROUP BY date_str
ORDER BY date_str;
结果成了我所寻找的
+-----+------------+----+----+----+----+----------+----------+
| Row | date_str | a1 | a2 | a3 | a4 | a5.teams | a5.users |
+-----+------------+----+----+----+----+----------+----------+
| 1 | 2018-07-17 | 2 | 4 | 1 | 7 | 2 | 4 |
| | | 1 | 7 | 2 | 4 | 1 | 7 |
| 2 | 2018-07-18 | 2 | 5 | 1 | 6 | 2 | 5 |
| | | 1 | 6 | 2 | 5 | 1 | 6 |
+-----+------------+----+----+----+----+----------+----------+
我不明白这是什么原因造成的。我完全困惑不解。
任何想法或建议都将不胜感激。如果我有误解,很抱歉,但默认的顺序是升序,因此排序正确吗
ARRAY_AGG(teams ORDER BY users desc) AS a1,
ARRAY_AGG(users ORDER BY users) AS a2,
ARRAY_AGG(teams ORDER BY teams) AS a3,
ARRAY_AGG(users ORDER BY teams desc) AS a4,
如果我把它们改为降序排序,我会得到想要的结果谢谢你的回答。是的,这个例子本身有点棘手和混乱。让我们以
a1
字段为例。它是一个由团队
按用户
排序的数组<代码>团队和相同日期的用户
在本例中顺序相反,这在a5
字段中得到了澄清。因此,当您按用户向上排序时,a1
中的团队将向下排序。希望这个解释对你有意义。
ARRAY_AGG(teams ORDER BY users desc) AS a1,
ARRAY_AGG(users ORDER BY users) AS a2,
ARRAY_AGG(teams ORDER BY teams) AS a3,
ARRAY_AGG(users ORDER BY teams desc) AS a4,