Sql Bigquery在其他列相同时查找最常用的值
我想添加列New_Family_id,并在标题相同时用最常见的Family_id填充它Sql Bigquery在其他列相同时查找最常用的值,sql,group-by,google-bigquery,Sql,Group By,Google Bigquery,我想添加列New_Family_id,并在标题相同时用最常见的Family_id填充它 Row GlobalId ProductTitleNL FamilyId New_Family_id 1 9200000005045711 ! at Gun Point... 9200000005045710 9200000011427871 2 9200000003809684 ! at Gun Point... 920
Row GlobalId ProductTitleNL FamilyId New_Family_id
1 9200000005045711 ! at Gun Point... 9200000005045710 9200000011427871
2 9200000003809684 ! at Gun Point... 9200000011427871 9200000011427871
3 9200000011427872 ! at Gun Point... 9200000011427871 9200000011427871
4 1001004011099420 Russian Dat 34388968 34388968
5 1001004011099421 Russian Dat 35434738 34388968
6 9200000000530359 !!Nos Vemos! 9200000000530358 9200000000530358
7 9200000000530343 !!Nos Vemos! 9200000000530342 9200000000530358
我试过几次群策群力,但都没用
我已经:
SELECT a.GlobalId, a.ProductTitleNL, a.FamilyId, a.Language, b.aantal_T
FROM table1 as a
JOIN (SELECT ProductTitleNL, COUNT(ProductTitleNL) as aantal_T
FROM table1
Group by ProductTitleNL
HAVING aantal_T >= 2) b
ON a.ProductTitleNL = b.ProductTitleNL
Group by a.GlobalId, a.ProductTitleNL, a.FamilyId, a.Language, b.aantal_T
Order by a.ProductTitleNL;
谢谢你的帮助 下面是BigQuery标准SQL
#standardSQL
SELECT * EXCEPT(ids),
(SELECT id FROM UNNEST(ids) id GROUP BY id ORDER BY COUNT(1) DESC LIMIT 1) New_Family_id
FROM (
SELECT *, ARRAY_AGG(FamilyId) OVER(PARTITION BY ProductTitleNL) ids
FROM `project.dataset.table`
)
您可以使用问题中的虚拟数据测试上述内容,如下所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT 9200000005045711 GlobalId, '! at Gun Point...' ProductTitleNL, 9200000005045710 FamilyId UNION ALL
SELECT 9200000003809684, '! at Gun Point...', 9200000011427871 UNION ALL
SELECT 9200000011427872, '! at Gun Point...', 9200000011427871 UNION ALL
SELECT 1001004011099420, 'Russian Dat', 34388968 UNION ALL
SELECT 1001004011099421, 'Russian Dat', 35434738 UNION ALL
SELECT 9200000000530359, '!!Nos Vemos!', 9200000000530358 UNION ALL
SELECT 9200000000530343, '!!Nos Vemos!', 9200000000530342
)
SELECT * EXCEPT(ids),
(SELECT id FROM UNNEST(ids) id GROUP BY id ORDER BY COUNT(1) DESC LIMIT 1) New_Family_id
FROM (
SELECT *, ARRAY_AGG(FamilyId) OVER(PARTITION BY ProductTitleNL) ids
FROM `project.dataset.table`
)
结果
Row GlobalId ProductTitleNL FamilyId New_Family_id
1 9200000005045711 ! at Gun Point... 9200000005045710 9200000011427871
2 9200000003809684 ! at Gun Point... 9200000011427871 9200000011427871
3 9200000011427872 ! at Gun Point... 9200000011427871 9200000011427871
4 9200000000530359 !!Nos Vemos! 9200000000530358 9200000000530358
5 9200000000530343 !!Nos Vemos! 9200000000530342 9200000000530358
6 1001004011099420 Russian Dat 34388968 34388968
7 1001004011099421 Russian Dat 35434738 34388968