根据google BigQuery SQL中的属性删除重复行
我有一个表叫做:result 我使用BigQuery从GA中选择数据根据google BigQuery SQL中的属性删除重复行,sql,google-bigquery,Sql,Google Bigquery,我有一个表叫做:result 我使用BigQuery从GA中选择数据 SELECT Date, totals.pageviews, h.transaction.transactionId, h.item.itemQuantity, h.transaction.transactionRevenue, totals.bounces, fullvisitorid, totals.timeOnSite, device.browser, device.deviceC
SELECT
Date,
totals.pageviews,
h.transaction.transactionId,
h.item.itemQuantity,
h.transaction.transactionRevenue,
totals.bounces,
fullvisitorid,
totals.timeOnSite,
device.browser,
device.deviceCategory,
trafficSource.source,
channelGrouping,
h.page.pagePath,
h.eventInfo.eventCategory,
device.operatingSystem
FROM
`atomic-life-148403.126959513.ga_sessions_*`,
UNNEST(hits) AS h
WHERE
_TABLE_SUFFIX BETWEEN REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL -1 YEAR) AS STRING), '-','')
AND CONCAT('intraday_', REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL 0 DAY) AS STRING), '-',''))
ORDER BY
date DESC
有些记录是重复的。如何从表中删除重复记录
我想得到以下结果。
您可以使用以下分析函数
select * from (
select *,
ROW_NUMBER() OVER(PARTITION BY transactionid ORDER BY transactionid) rownum
from result ) xxx
where rownum = 1;
您可以选择唯一行并删除其他行:
DELETE FROM MyTable
LEFT OUTER JOIN (
SELECT DISTINCT * FROM MyTable
) as UniqueRows ON
MyTable.KeyField= UniqueRows.KeyField
WHERE
UniqueRows.KeyField IS NULL;
将GROUP BY与所有选定列一起使用应该可以消除结果中任何真正重复的行:
SELECT
Date,
totals.pageviews,
h.transaction.transactionId,
h.item.itemQuantity,
h.transaction.transactionRevenue,
totals.bounces,
fullvisitorid,
totals.timeOnSite,
device.browser,
device.deviceCategory,
trafficSource.source,
channelGrouping,
h.page.pagePath,
h.eventInfo.eventCategory,
device.operatingSystem
FROM
`atomic-life-148403.126959513.ga_sessions_*`,
UNNEST(hits) AS h
WHERE
_TABLE_SUFFIX BETWEEN REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL -1
YEAR) AS STRING), '-','')
AND CONCAT('intraday_', REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL 0 DAY) AS STRING), '-',''))
GROUP BY
Date,
totals.pageviews,
h.transaction.transactionId,
h.item.itemQuantity,
h.transaction.transactionRevenue,
totals.bounces,
fullvisitorid,
totals.timeOnSite,
device.browser,
device.deviceCategory,
trafficSource.source,
channelGrouping,
h.page.pagePath,
h.eventInfo.eventCategory,
device.operatingSystem
ORDER BY
date DESC;
你可以用排号
WITH CTE AS
(SELECT *, ROW_NUMBER() OVER (PARTITION BY transactionid ORDER BY
transactionid) ROW FROM [YourTable])
DELETE [YourTable]
FROM [YourTable]
JOIN CTE ON [YourTable].transactionid ON CTE.transactionid
WHERE CTE.ROW > 1
下面是BigQuery标准SQL 标准SQL 选择不同的 日期 总计.pageviews, h、 transaction.transactionId, h、 item.itemQuantity, h、 交易,交易收入, 总数。反弹, 全视, 总计。现场时间, device.browser, 设备。设备类别, trafficSource.source, 信道分组, h、 page.pagePath, h、 eventInfo.eventCategory, 设备操作系统 从…起 `原子寿命-148403.126959513.ga_sessions_*`, 不耐烦 哪里 _表\u REPLACECASTDATE\u ADDCURRENT\u日期之间的后缀,间隔为-1年,字符串为“-”, 和CONCAT'intraday',REPLACECASTDATE_ADDCURRENT_DATE,间隔0天作为字符串'-', 订购人 日期说明
正如您所看到的-我刚刚将DISTINCT添加到您的SELECT中-请参阅有关BigQuery的更多信息标准SQL不能使用DISTINCTonly@bob90937,为什么?您需要使用来使用SELECT DISTINCT。是否确实要查找和删除行,或者只是在查询结果中隐藏它们?如果是后者,请使用DISTINCT。如果是前者,它会变得更复杂一些。如何只选择不同的行?由于itemquentity和revenue是相互独立的,因此,您可以使用投票下方张贴的答案左侧的勾号标记接受答案。看看为什么它很重要!同样重要的是对答案进行投票。投票选出有帮助的答案。还有更多。。。当有人回答你的问题时,你可以检查一下该做什么。
WITH CTE AS
(SELECT *, ROW_NUMBER() OVER (PARTITION BY transactionid ORDER BY
transactionid) ROW FROM [YourTable])
DELETE [YourTable]
FROM [YourTable]
JOIN CTE ON [YourTable].transactionid ON CTE.transactionid
WHERE CTE.ROW > 1