Sql 通过BigQuery中的值获取记录类型中的最后一个匹配值

Sql 通过BigQuery中的值获取记录类型中的最后一个匹配值,sql,google-bigquery,Sql,Google Bigquery,我在BigQuery中有一个如下所示的数据结构: [{ sessionID: '123456', revenue: 100.00, pagesViewed: [ {hit: 1, val: "a.html"}, {hit:3, val: "b.html"}, {hit:3, val: "c.html?test=AAC"}, {hit:10, val:"d.html?test=CCC"} ] }, { sessionID: '5555',

我在BigQuery中有一个如下所示的数据结构:

[{
    sessionID: '123456',
    revenue: 100.00,
    pagesViewed: [
      {hit: 1, val: "a.html"}, {hit:3, val: "b.html"}, {hit:3, val: "c.html?test=AAC"}, {hit:10, val:"d.html?test=CCC"}
    ]
},
{
    sessionID: '5555',
    revenue: 50.00,
    pagesViewed: [
      {hit: 1, val: "a.html"}, {hit:3, val: "b.html?test=123"}, {hit:9, val: "c.html"}, {hit:14, val:"d.html"}
    ]
}]
我正在尝试获取每个会话的最后一个测试ID。对于会话A,最后一个测试ID将等于:CCC。对于会话B,它应该等于123。从那里,我试图通过最终测试值得到收入的总和

我尝试过的查询是:

SELECT
  REGEXP_EXTRACT(mnt,r'\?test\=([^&]*)') as TestId,
  SUM(rev) as Revenue
FROM (
  SELECT
    sessionID,
    MAX(CONCAT(CAST(pagesViewed.hit AS string),pagePagesViewed.val)) AS mnt,
    MAX(revenue) AS rev
  FROM
    `table` AS m,
    UNNEST(m.pagesViewed) AS pagesViewed
  WHERE
    pagesViewed.val LIKE "%test=%"
  GROUP BY
    1
  ORDER BY
    1,
    2 ASC)
GROUP BY
  1
ORDER BY
  2 DESC
但是,输出与上面的预期值不匹配。任何帮助都将不胜感激

输出:

Row TestId  Revenue  
1   AAC     100.0    
2   123     50.0    
期望

Row TestId  Revenue  
1   CCC     100.0    
2   123     50.0    

这应该适用于您的目的:

SELECT
  (SELECT
     ARRAY_AGG(
       REGEXP_EXTRACT(pageViewed.val,r'\?test\=([^&]*)')
       IGNORE NULLS ORDER BY pageViewed.hit DESC LIMIT 1)[OFFSET(0)]
   FROM UNNEST(pagesViewed) AS pageViewed
  ) AS TestId,
  SUM(revenue) AS Revenue
FROM `project.dataset.table`
GROUP BY 1
ORDER BY 2 DESC;
它返回数组中最后一个匹配的“test”值。您可以在示例数据上进行尝试:

WITH `project.dataset.table` AS (
  SELECT '123456' AS sessionId, 100.00 AS revenue, ARRAY<STRUCT<hit INT64, val STRING>>[(1, 'a.html'), (2, 'b.html'), (3, 'c.html?test=AAC'), (4, 'd.html?test=CCC')] AS pagesViewed UNION ALL
  SELECT '5555', 50.00, ARRAY<STRUCT<hit INT64, val STRING>>[(1, 'a.html'), (2, 'b.html?test=123'), (3, 'c.html'), (4, 'd.html')]
)
SELECT
  (SELECT
     ARRAY_AGG(
       REGEXP_EXTRACT(pageViewed.val,r'\?test\=([^&]*)')
       IGNORE NULLS ORDER BY pageViewed.hit DESC LIMIT 1)[OFFSET(0)]
   FROM UNNEST(pagesViewed) AS pageViewed
  ) AS TestId,
  SUM(revenue) AS Revenue
FROM `project.dataset.table`
GROUP BY 1
ORDER BY 2 DESC;
这将一行中的CCC100.0和另一行中的12350.0作为输出