Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/83.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
PostgreSQL选择两列最常见的组合_Sql_Postgresql - Fatal编程技术网

PostgreSQL选择两列最常见的组合

PostgreSQL选择两列最常见的组合,sql,postgresql,Sql,Postgresql,我需要在PostgreSQL中操作表的帮助 我有一个有三列和超过3万行的表。许多值在行中重复 我必须编写一个SELECT语句,输出第一列的最频繁值、第二列的最频繁值以及两列的最常见组合。整个事件必须按第三列进行分组 我所尝试的: SELECT * FROM (SELECT column1 AS " Most frequent1", COUNT(column1) AS "occurrence" FROM table_name

我需要在PostgreSQL中操作表的帮助

我有一个有三列和超过3万行的表。许多值在行中重复

我必须编写一个
SELECT
语句,输出第一列的最频繁值、第二列的最频繁值以及两列的最常见组合。整个事件必须按第三列进行分组

我所尝试的:

SELECT * 
FROM 
    (SELECT column1 AS " Most frequent1", COUNT(column1) AS "occurrence"
     FROM table_name
     GROUP BY column1
     ORDER BY occurrence DESC
     LIMIT 1),
    (SELECT column2 AS "Most frequent2", COUNT(column2) AS "occurrence"
     FROM table_name
     GROUP BY column2
     ORDER BY occurrence DESC
     LIMIT 1),
    (SELECT CONCAT(column1, column2) AS "kombiniert", COUNT(CONCAT(column1, column2)) AS "occurrence"
     FROM table_name
     GROUP BY kombiniert
     ORDER BY occurrence DESC
     LIMIT 1);
我怎样才能把整件事按第三栏分组


有更好的方法吗?

统计中最常见的值称为模式。这将返回您正在寻找的三种模式:

select col3,
       max(col1_cnt), max(case when seqnum_1 then col1 end),
       max(col2_cnt), max(case when seqnum_2 then col2 end),
       max(col12_cnt), max(case when seqnum_12 then col1 || ':' || col2 end)
from (select t.*,
             row_number() over (partition by col3 order by col1_cnt desc) as seqnum_1,
             row_number() over (partition by col3 order by col2_cnt desc) as seqnum_2,
             row_number() over (partition by col3 order by col12_cnt desc) as seqnum_12
      from (select col1, col2, col3, count(*) as cnt,
                   sum(count(*)) over (partition by col1) as col1_cnt,
                   sum(count(*)) over (partition by col2) as col2_cnt,
                   sum(count(*)) over (partition by col1, col2) as col1_col2_cnt
            from t
            group by col1, col2, col3
           ) t
     ) t
group by col3;

最常见的值称为统计中的模式。这将返回您正在寻找的三种模式:

select col3,
       max(col1_cnt), max(case when seqnum_1 then col1 end),
       max(col2_cnt), max(case when seqnum_2 then col2 end),
       max(col12_cnt), max(case when seqnum_12 then col1 || ':' || col2 end)
from (select t.*,
             row_number() over (partition by col3 order by col1_cnt desc) as seqnum_1,
             row_number() over (partition by col3 order by col2_cnt desc) as seqnum_2,
             row_number() over (partition by col3 order by col12_cnt desc) as seqnum_12
      from (select col1, col2, col3, count(*) as cnt,
                   sum(count(*)) over (partition by col1) as col1_cnt,
                   sum(count(*)) over (partition by col2) as col2_cnt,
                   sum(count(*)) over (partition by col1, col2) as col1_col2_cnt
            from t
            group by col1, col2, col3
           ) t
     ) t
group by col3;

您要查找的是
模式
,统计模式=最频繁的值

SELECT
  column3,
  MODE() WITHIN GROUP (ORDER BY column1) AS most_frequent_column1,
  MODE() WITHIN GROUP (ORDER BY column2) AS most_frequent_column2,
  MODE() WITHIN GROUP (ORDER BY column1 || column2) AS most_frequent_pair
FROM table_name
GROUP BY column3
ORDER BY column3;
两点:

  • 模式
    忽略空值。如果这是不需要的,您必须解决这个问题
  • 只是把两根绳子粘在一起可能是你想要的,也可能不是AB'| | CD'='ABCD'A'| | BCD'='ABCD'

对于后者,请参见下面带有“无”名称的“马”注释<组内的code>MODE()(ORDER BY(column1,column2))完美地解决了不明确对的问题。

您要查找的是
模式,统计模式=最频繁的值

SELECT
  column3,
  MODE() WITHIN GROUP (ORDER BY column1) AS most_frequent_column1,
  MODE() WITHIN GROUP (ORDER BY column2) AS most_frequent_column2,
  MODE() WITHIN GROUP (ORDER BY column1 || column2) AS most_frequent_pair
FROM table_name
GROUP BY column3
ORDER BY column3;
两点:

  • 模式
    忽略空值。如果这是不需要的,您必须解决这个问题
  • 只是把两根绳子粘在一起可能是你想要的,也可能不是AB'| | CD'='ABCD'A'| | BCD'='ABCD'

对于后者,请参见下面带有“无”名称的“马”注释<组内的code>MODE()(ORDER BY(column1,column2))
完美地解决了不明确对的问题。

您可以使用
(ORDER BY(column1,column2))
来避免将
AB,CD
ABC,D
一样处理,这也使得它可以处理任何数据类型,不只是strings@a_horse_with_no_name:否。
模式
ORDER BY
子句中必须只有一个参数,因为这是我们想要统计模式的表达式。(是的,这是一种奇怪的语法。我不知道他们为什么不简单地把这个
模式(表达式)
)。至于将AB、CD和ABC、D视为相同的值:是的,我在回答中提到了这一点。这就是OP连接字符串的方式,我提到这可能是不需要的。
(column1,column2)
是一个表达式-一个匿名的“行类型”@a_horse_,没有名字:对不起,你当然是对的。你可以使用
(ORDER BY(column1,column2))
来避免像对待
ABC一样对待
AB,CD
,D
-这也使得它可以用于任何数据类型,而不仅仅是strings@a_horse_with_no_name:否。
模式
ORDER BY
子句中必须只有一个参数,因为这是我们想要统计模式的表达式。(是的,这是一种奇怪的语法。我不知道他们为什么不简单地把这个
模式(表达式)
)。至于将AB、CD和ABC、D视为相同的值:是的,我在回答中提到了这一点。这就是OP连接字符串的方式,我提到这可能是不需要的。
(column1,column2)
是一个表达式-一个匿名的“行类型”@a_horse_,没有名称:对不起,你当然是对的。