SQL是否使用多个/相关列计算项目频率?

SQL是否使用多个/相关列计算项目频率?,sql,oracle,Sql,Oracle,我对SQL完全陌生,读过关于SQL的StackOverflow文章,试图弄明白这一点,而其他来源的文章却无法在SQL中做到这一点。这是 我有一个包含3列和数千行的表,其中包含前2列的数据。第三列当前为空,我需要根据第一列和第二列中已有的数据填充第三列 假设我在第一列有状态,在第二列有水果条目。我需要编写一条SQL语句,计算每个水果来自的不同州的数量,然后将这个流行数字插入每行的第三列。该行中的流行数字1表示水果只来自一个州,流行数字4表示水果来自4个州。因此,我的表当前如下所示: state

我对SQL完全陌生,读过关于SQL的StackOverflow文章,试图弄明白这一点,而其他来源的文章却无法在SQL中做到这一点。这是

我有一个包含3列和数千行的表,其中包含前2列的数据。第三列当前为空,我需要根据第一列和第二列中已有的数据填充第三列

假设我在第一列有状态,在第二列有水果条目。我需要编写一条SQL语句,计算每个水果来自的不同州的数量,然后将这个流行数字插入每行的第三列。该行中的流行数字1表示水果只来自一个州,流行数字4表示水果来自4个州。因此,我的表当前如下所示:

state     fruit     popularity

hawaii    apple     
hawaii    apple     
hawaii    banana       
hawaii    kiwi      
hawaii    kiwi      
hawaii    mango        
florida   apple      
florida   apple        
florida   apple        
florida   orange      
michigan  apple     
michigan  apple     
michigan  apricot   
michigan  orange    
michigan  pear      
michigan  pear      
michigan  pear      
texas     apple     
texas     banana    
texas     banana    
texas     banana    
texas     grape     
我需要弄清楚如何计算并更新第三列,名为popularity,即出口水果的州数。目标是制作下表(对不起,糟糕的双关语),根据上表,“苹果”出现在所有4个州,橙子和香蕉出现在2个州,猕猴桃、芒果、梨和葡萄只出现在1个州,因此它们相应的受欢迎程度

state     fruit     popularity

hawaii    apple     4
hawaii    apple     4
hawaii    banana    2   
hawaii    kiwi      1
hawaii    kiwi      1
hawaii    mango     1   
florida   apple     4 
florida   apple     4   
florida   apple     4   
florida   orange    2  
michigan  apple     4
michigan  apple     4
michigan  apricot   1
michigan  orange    2
michigan  pear      1
michigan  pear      1
michigan  pear      1
texas     apple     4
texas     banana    2
texas     banana    2
texas     banana    2
texas     grape     1
我的小程序员大脑告诉我,要想办法用某种脚本循环数据,但稍微读一读SQL和数据库,你似乎不会用SQL编写长而慢的循环脚本。我甚至不确定你能不能?但是,在SQL中有更好/更快的方法来实现这一点

有人知道如何在SQL语句中计算和更新每一行的第三列,在这里称为popularity,对应于每个水果来自的状态数吗?感谢您的阅读,非常感谢您的帮助

到目前为止,我已经尝试了下面这些SQL语句,它们的输出并不能满足我的需要:

--outputs those fruits appearing multiple times in the table
SELECT fruit, COUNT(*)
  FROM table 
 GROUP BY fruit
HAVING COUNT(*) > 1
 ORDER BY COUNT(*) DESC

--outputs those fruits appearing only once in the table
SELECT fruit, COUNT(*)
  FROM table 
 GROUP BY fruit
HAVING COUNT(*) = 1

--outputs list of unique fruits in the table
SELECT COUNT (DISTINCT(fruit))
  FROM table
如果您的桌子是水果

计算每个水果的不同状态

select fruit, COUNT(distinct state) statecount from #fruit group by fruit
用这些值更新表

update #fruit
set popularity
    = statecount
from
 #fruit
    inner join 
      (select fruit, COUNT(distinct state) statecount from #fruit group by fruit) sc
        on #fruit.fruit = sc.fruit

这应该能让你走到那里。基本上,您希望获得水果所处的不同状态的计数,然后使用该计数将其连接回原始表

update table
set count = cnt
from 
  (
    select fruit, count(distinct state) as cnt 
    from table
    group by fruit) cnts
  inner join table t
    on cnts.fruit = t.fruit

如果您只想使用优先级更新表,它将如下所示:

update my_table x
   set popularity = ( select count(distinct state) 
                        from my_table
                       where fruit = x.fruit )
如果要选择数据,则可以使用分析查询:

select state, fruit
     , count(distinct state) over ( partition by fruit ) as popularity
  from my_table
这提供了每个水果的不同状态数。

另一个选项:

SELECT fruit
,      COUNT(*)
FROM
(
SELECT state
,      fruit
,      ROW_NUMBER() OVER (PARTITION BY state, fruit ORDER BY NULL) rn
FROM   t
)
WHERE rn = 1
GROUP BY fruit
ORDER BY fruit;
我运行了这个,得到了(我认为)你想要的:

WITH t
  AS (SELECT 'hawaii' as STATE, 'apple' as fruit FROM dual
      UNION ALL
      SELECT 'hawaii' as STATE, 'apple' as fruit FROM dual
      UNION ALL
      SELECT 'hawaii' as STATE, 'banana' as fruit FROM dual
      UNION ALL
      SELECT 'hawaii' as STATE, 'kiwi' as fruit FROM dual
      UNION ALL
      SELECT 'hawaii' as STATE, 'kiwi' as fruit FROM dual
      UNION ALL
      SELECT 'hawaii' as STATE, 'mango' as fruit FROM dual
      UNION ALL
      SELECT 'florida' as STATE, 'apple' as fruit FROM dual
      UNION ALL
      SELECT 'florida' as STATE, 'apple' as fruit FROM dual
      UNION ALL
      SELECT 'florida' as STATE, 'apple' as fruit FROM dual
      UNION ALL
      SELECT 'florida' as STATE, 'orange' as fruit FROM dual
      UNION ALL
      SELECT 'michigan' as STATE, 'apple' as fruit FROM dual
      UNION ALL
      SELECT 'michigan' as STATE, 'apple' as fruit FROM dual
      UNION ALL
      SELECT 'michigan' as STATE, 'apricot' as fruit FROM dual
      UNION ALL
      SELECT 'michigan' as STATE, 'orange' as fruit FROM dual
      UNION ALL
      SELECT 'michigan' as STATE, 'pear' as fruit FROM dual
      UNION ALL
      SELECT 'michigan' as STATE, 'pear' as fruit FROM dual
      UNION ALL
      SELECT 'michigan' as STATE, 'pear' as fruit FROM dual
      UNION ALL
      SELECT 'texas' as STATE, 'apple' as fruit FROM dual
      UNION ALL
      SELECT 'texas' as STATE, 'banana' as fruit FROM dual
      UNION ALL
      SELECT 'texas' as STATE, 'banana' as fruit FROM dual
      UNION ALL
      SELECT 'texas' as STATE, 'banana' as fruit FROM dual
      UNION ALL
      SELECT 'texas' as STATE, 'grape' as fruit FROM dual)
SELECT state,
       fruit,
       count(DISTINCT state) OVER (PARTITION BY fruit) AS popularity
  FROM t;
返回

florida     apple   4
florida     apple   4
florida     apple   4
hawaii      apple   4
hawaii      apple   4
michigan    apple   4
michigan    apple   4
texas       apple   4
michigan    apricot 1
hawaii      banana  2
texas       banana  2
texas       banana  2
texas       banana  2
texas       grape   1
hawaii      kiwi    1
hawaii      kiwi    1
hawaii      mango   1
florida     orange  2
michigan    orange  2
michigan    pear    1
michigan    pear    1
显然,您只需要运行:

SELECT state,
       fruit,
       count(DISTINCT state) OVER (PARTITION BY fruit) AS popularity
  FROM table_name;
希望它有帮助…

试试这个:

select a.*,b.total
from [table] as a
left join 
(
SELECT fruit,count(distinct [state]) as total
  FROM [table]
  group by fruit
) as b
on a.fruit = b.fruit
注意这是SQL Server代码,如果需要,请自行调整。

试试这个

create table states([state] varchar(10),fruit varchar(10),popularity int)
INSERT INTO states([state],fruit) 
VALUES('hawaii','apple'),
('hawaii','apple'),     
('hawaii','banana'),       
('hawaii','kiwi'),      
('hawaii','kiwi'),      
('hawaii','mango'),        
('florida','apple'),      
('florida','apple'),        
('florida','apple'),        
('florida','orange'),      
('michigan','apple'),     
('michigan','apple'),     
('michigan','apricot'),   
('michigan','orange'),    
('michigan','pear'),      
('michigan','pear'),      
('michigan','pear'),      
('texas','apple'),     
('texas','banana'),    
('texas','banana'),    
('texas','banana'),
('texas','grape')

update t set t.popularity=a.cnt
from states t inner join
(SELECT fruit,count(distinct [state]) as cnt
  FROM states
  group by fruit) a
on t.fruit =a.fruit 

这是一个非常糟糕的数据模型。聚合(例如在示例中的流行度)应该按需计算,或者如果数据量和性能要求认为有必要的话,在分离的数据结构(如物化视图)中保存。要理解为什么它是一个坏数据模型,请考虑这种情况。删除一行{夏威夷,新西兰}。同时,在另一届会议上,我插入两行{明尼苏达,新西兰}和{纽约,新西兰}。猕猴桃受欢迎的正确价值是什么?当我们都试图更新所有其他行时会发生什么呢?@yaroslav-SQL语句中CASE的使用取决于个人品味(或组织标准)。最初提供的代码是完全有效的。@APC,我不明白,你说的是哪种情况?我没有看到任何案件的原始代码,我有双重检查只是以防万一。当我编辑代码时,我不删除任何内容,只是改进formatting@APC这可以,而且已经在每个tSQL论坛、博客等上讨论了很多次。在我和其他一些人看来,大写关键字更具可读性。但感谢您指出这一点,我相当肯定这不是Oracle语法。这只返回列FROUT。然后只有一个随机选择的行。那么把它变成一个更新<代码>更新t集受欢迎程度=(选择x.qty FROM(选择水果,计数(*)数量FROM(选择状态,水果,行号()超过(按状态划分,水果顺序为空)rn FROM t),其中rn=1按水果分组)x其中x.FROUT=t.FROUT)这个版本适合我。我很确定这不是Oracle语法。非常感谢!!!您提供的更新解决方案在我的小练习(水果)表上非常有效,但是,当我使用包含数百万行的实际表尝试此方法时,它花费了一个多小时,但仍未完成,我猜每行的更新成本相当高?您提供的第二个SELECT解决方案在创建表new_TABLE as(SELECT…)之后使用时效果非常好,速度也快得多。我猜创建一个新表对于SQL来说是一个糟糕的做法,但是在我的表上它比更新选项的速度要快得多。这是个好主意!