PostgreSQL选择两列最常见的组合
我需要在PostgreSQL中操作表的帮助 我有一个有三列和超过3万行的表。许多值在行中重复 我必须编写一个PostgreSQL选择两列最常见的组合,sql,postgresql,Sql,Postgresql,我需要在PostgreSQL中操作表的帮助 我有一个有三列和超过3万行的表。许多值在行中重复 我必须编写一个SELECT语句,输出第一列的最频繁值、第二列的最频繁值以及两列的最常见组合。整个事件必须按第三列进行分组 我所尝试的: SELECT * FROM (SELECT column1 AS " Most frequent1", COUNT(column1) AS "occurrence" FROM table_name
SELECT
语句,输出第一列的最频繁值、第二列的最频繁值以及两列的最常见组合。整个事件必须按第三列进行分组
我所尝试的:
SELECT *
FROM
(SELECT column1 AS " Most frequent1", COUNT(column1) AS "occurrence"
FROM table_name
GROUP BY column1
ORDER BY occurrence DESC
LIMIT 1),
(SELECT column2 AS "Most frequent2", COUNT(column2) AS "occurrence"
FROM table_name
GROUP BY column2
ORDER BY occurrence DESC
LIMIT 1),
(SELECT CONCAT(column1, column2) AS "kombiniert", COUNT(CONCAT(column1, column2)) AS "occurrence"
FROM table_name
GROUP BY kombiniert
ORDER BY occurrence DESC
LIMIT 1);
我怎样才能把整件事按第三栏分组
有更好的方法吗?统计中最常见的值称为模式。这将返回您正在寻找的三种模式:
select col3,
max(col1_cnt), max(case when seqnum_1 then col1 end),
max(col2_cnt), max(case when seqnum_2 then col2 end),
max(col12_cnt), max(case when seqnum_12 then col1 || ':' || col2 end)
from (select t.*,
row_number() over (partition by col3 order by col1_cnt desc) as seqnum_1,
row_number() over (partition by col3 order by col2_cnt desc) as seqnum_2,
row_number() over (partition by col3 order by col12_cnt desc) as seqnum_12
from (select col1, col2, col3, count(*) as cnt,
sum(count(*)) over (partition by col1) as col1_cnt,
sum(count(*)) over (partition by col2) as col2_cnt,
sum(count(*)) over (partition by col1, col2) as col1_col2_cnt
from t
group by col1, col2, col3
) t
) t
group by col3;
最常见的值称为统计中的模式。这将返回您正在寻找的三种模式:
select col3,
max(col1_cnt), max(case when seqnum_1 then col1 end),
max(col2_cnt), max(case when seqnum_2 then col2 end),
max(col12_cnt), max(case when seqnum_12 then col1 || ':' || col2 end)
from (select t.*,
row_number() over (partition by col3 order by col1_cnt desc) as seqnum_1,
row_number() over (partition by col3 order by col2_cnt desc) as seqnum_2,
row_number() over (partition by col3 order by col12_cnt desc) as seqnum_12
from (select col1, col2, col3, count(*) as cnt,
sum(count(*)) over (partition by col1) as col1_cnt,
sum(count(*)) over (partition by col2) as col2_cnt,
sum(count(*)) over (partition by col1, col2) as col1_col2_cnt
from t
group by col1, col2, col3
) t
) t
group by col3;
您要查找的是
模式
,统计模式=最频繁的值
SELECT
column3,
MODE() WITHIN GROUP (ORDER BY column1) AS most_frequent_column1,
MODE() WITHIN GROUP (ORDER BY column2) AS most_frequent_column2,
MODE() WITHIN GROUP (ORDER BY column1 || column2) AS most_frequent_pair
FROM table_name
GROUP BY column3
ORDER BY column3;
两点:
忽略空值。如果这是不需要的,您必须解决这个问题模式
- 只是把两根绳子粘在一起可能是你想要的,也可能不是AB'| | CD'='ABCD'A'| | BCD'='ABCD'
对于后者,请参见下面带有“无”名称的“马”注释<组内的code>MODE()(ORDER BY(column1,column2))完美地解决了不明确对的问题。您要查找的是
模式,统计模式=最频繁的值
SELECT
column3,
MODE() WITHIN GROUP (ORDER BY column1) AS most_frequent_column1,
MODE() WITHIN GROUP (ORDER BY column2) AS most_frequent_column2,
MODE() WITHIN GROUP (ORDER BY column1 || column2) AS most_frequent_pair
FROM table_name
GROUP BY column3
ORDER BY column3;
两点:
模式
忽略空值。如果这是不需要的,您必须解决这个问题
- 只是把两根绳子粘在一起可能是你想要的,也可能不是AB'| | CD'='ABCD'A'| | BCD'='ABCD'
对于后者,请参见下面带有“无”名称的“马”注释<组内的code>MODE()(ORDER BY(column1,column2))
完美地解决了不明确对的问题。您可以使用(ORDER BY(column1,column2))
来避免将AB,CD
与ABC,D
一样处理,这也使得它可以处理任何数据类型,不只是strings@a_horse_with_no_name:否。模式
的ORDER BY
子句中必须只有一个参数,因为这是我们想要统计模式的表达式。(是的,这是一种奇怪的语法。我不知道他们为什么不简单地把这个模式(表达式)
)。至于将AB、CD和ABC、D视为相同的值:是的,我在回答中提到了这一点。这就是OP连接字符串的方式,我提到这可能是不需要的。(column1,column2)
是一个表达式-一个匿名的“行类型”@a_horse_,没有名字:对不起,你当然是对的。你可以使用(ORDER BY(column1,column2))
来避免像对待ABC一样对待AB,CD
,D
-这也使得它可以用于任何数据类型,而不仅仅是strings@a_horse_with_no_name:否。模式
的ORDER BY
子句中必须只有一个参数,因为这是我们想要统计模式的表达式。(是的,这是一种奇怪的语法。我不知道他们为什么不简单地把这个模式(表达式)
)。至于将AB、CD和ABC、D视为相同的值:是的,我在回答中提到了这一点。这就是OP连接字符串的方式,我提到这可能是不需要的。(column1,column2)
是一个表达式-一个匿名的“行类型”@a_horse_,没有名称:对不起,你当然是对的。