根据google BigQuery SQL中的属性删除重复行

根据google BigQuery SQL中的属性删除重复行,sql,google-bigquery,Sql,Google Bigquery,我有一个表叫做:result 我使用BigQuery从GA中选择数据 SELECT Date, totals.pageviews, h.transaction.transactionId, h.item.itemQuantity, h.transaction.transactionRevenue, totals.bounces, fullvisitorid, totals.timeOnSite, device.browser, device.deviceC

我有一个表叫做:result 我使用BigQuery从GA中选择数据

SELECT
  Date,
  totals.pageviews,
  h.transaction.transactionId,
  h.item.itemQuantity,
  h.transaction.transactionRevenue,
  totals.bounces,
  fullvisitorid,
  totals.timeOnSite,
  device.browser,
  device.deviceCategory,
  trafficSource.source,
  channelGrouping,
  h.page.pagePath,
  h.eventInfo.eventCategory,
  device.operatingSystem
FROM
  `atomic-life-148403.126959513.ga_sessions_*`,
  UNNEST(hits) AS h
WHERE
  _TABLE_SUFFIX BETWEEN REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL -1 YEAR) AS STRING), '-','')
  AND CONCAT('intraday_', REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL 0 DAY) AS STRING), '-',''))
  ORDER BY
  date  DESC
有些记录是重复的。如何从表中删除重复记录

我想得到以下结果。

您可以使用以下分析函数

select * from (
select *,
ROW_NUMBER() OVER(PARTITION BY transactionid ORDER BY transactionid) rownum
from result ) xxx
where rownum = 1;

您可以选择唯一行并删除其他行:

DELETE FROM MyTable
LEFT OUTER JOIN (
   SELECT DISTINCT * FROM MyTable
) as UniqueRows ON
   MyTable.KeyField= UniqueRows.KeyField
WHERE
   UniqueRows.KeyField IS NULL;
将GROUP BY与所有选定列一起使用应该可以消除结果中任何真正重复的行:

SELECT
  Date,
  totals.pageviews,
  h.transaction.transactionId,
  h.item.itemQuantity,
  h.transaction.transactionRevenue,
  totals.bounces,
  fullvisitorid,
  totals.timeOnSite,
  device.browser,
  device.deviceCategory,
  trafficSource.source,
  channelGrouping,
  h.page.pagePath,
  h.eventInfo.eventCategory,
  device.operatingSystem
FROM
  `atomic-life-148403.126959513.ga_sessions_*`,
  UNNEST(hits) AS h
WHERE
  _TABLE_SUFFIX BETWEEN REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL -1 
YEAR) AS STRING), '-','')
  AND CONCAT('intraday_', REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL 0 DAY) AS STRING), '-',''))
GROUP BY
  Date,
  totals.pageviews,
  h.transaction.transactionId,
  h.item.itemQuantity,
  h.transaction.transactionRevenue,
  totals.bounces,
  fullvisitorid,
  totals.timeOnSite,
  device.browser,
  device.deviceCategory,
  trafficSource.source,
  channelGrouping,
  h.page.pagePath,
  h.eventInfo.eventCategory,
  device.operatingSystem
ORDER BY
  date  DESC;

你可以用排号

WITH CTE AS 
(SELECT *, ROW_NUMBER() OVER (PARTITION BY transactionid ORDER BY 
transactionid) ROW FROM [YourTable]) 

DELETE [YourTable] 
FROM [YourTable]
JOIN CTE ON [YourTable].transactionid ON CTE.transactionid
                              WHERE CTE.ROW > 1

下面是BigQuery标准SQL

标准SQL 选择不同的 日期 总计.pageviews, h、 transaction.transactionId, h、 item.itemQuantity, h、 交易,交易收入, 总数。反弹, 全视, 总计。现场时间, device.browser, 设备。设备类别, trafficSource.source, 信道分组, h、 page.pagePath, h、 eventInfo.eventCategory, 设备操作系统 从…起 `原子寿命-148403.126959513.ga_sessions_*`, 不耐烦 哪里 _表\u REPLACECASTDATE\u ADDCURRENT\u日期之间的后缀,间隔为-1年,字符串为“-”, 和CONCAT'intraday',REPLACECASTDATE_ADDCURRENT_DATE,间隔0天作为字符串'-', 订购人 日期说明
正如您所看到的-我刚刚将DISTINCT添加到您的SELECT中-请参阅有关BigQuery的更多信息标准SQL不能使用DISTINCTonly@bob90937,为什么?您需要使用来使用SELECT DISTINCT。是否确实要查找和删除行,或者只是在查询结果中隐藏它们?如果是后者,请使用DISTINCT。如果是前者,它会变得更复杂一些。如何只选择不同的行?由于itemquentity和revenue是相互独立的,因此,您可以使用投票下方张贴的答案左侧的勾号标记接受答案。看看为什么它很重要!同样重要的是对答案进行投票。投票选出有帮助的答案。还有更多。。。当有人回答你的问题时,你可以检查一下该做什么。
WITH CTE AS 
(SELECT *, ROW_NUMBER() OVER (PARTITION BY transactionid ORDER BY 
transactionid) ROW FROM [YourTable]) 

DELETE [YourTable] 
FROM [YourTable]
JOIN CTE ON [YourTable].transactionid ON CTE.transactionid
                              WHERE CTE.ROW > 1