SQL是否使用多个/相关列计算项目频率?
我对SQL完全陌生,读过关于SQL的StackOverflow文章,试图弄明白这一点,而其他来源的文章却无法在SQL中做到这一点。这是 我有一个包含3列和数千行的表,其中包含前2列的数据。第三列当前为空,我需要根据第一列和第二列中已有的数据填充第三列 假设我在第一列有状态,在第二列有水果条目。我需要编写一条SQL语句,计算每个水果来自的不同州的数量,然后将这个流行数字插入每行的第三列。该行中的流行数字1表示水果只来自一个州,流行数字4表示水果来自4个州。因此,我的表当前如下所示:SQL是否使用多个/相关列计算项目频率?,sql,oracle,Sql,Oracle,我对SQL完全陌生,读过关于SQL的StackOverflow文章,试图弄明白这一点,而其他来源的文章却无法在SQL中做到这一点。这是 我有一个包含3列和数千行的表,其中包含前2列的数据。第三列当前为空,我需要根据第一列和第二列中已有的数据填充第三列 假设我在第一列有状态,在第二列有水果条目。我需要编写一条SQL语句,计算每个水果来自的不同州的数量,然后将这个流行数字插入每行的第三列。该行中的流行数字1表示水果只来自一个州,流行数字4表示水果来自4个州。因此,我的表当前如下所示: state
state fruit popularity
hawaii apple
hawaii apple
hawaii banana
hawaii kiwi
hawaii kiwi
hawaii mango
florida apple
florida apple
florida apple
florida orange
michigan apple
michigan apple
michigan apricot
michigan orange
michigan pear
michigan pear
michigan pear
texas apple
texas banana
texas banana
texas banana
texas grape
我需要弄清楚如何计算并更新第三列,名为popularity,即出口水果的州数。目标是制作下表(对不起,糟糕的双关语),根据上表,“苹果”出现在所有4个州,橙子和香蕉出现在2个州,猕猴桃、芒果、梨和葡萄只出现在1个州,因此它们相应的受欢迎程度
state fruit popularity
hawaii apple 4
hawaii apple 4
hawaii banana 2
hawaii kiwi 1
hawaii kiwi 1
hawaii mango 1
florida apple 4
florida apple 4
florida apple 4
florida orange 2
michigan apple 4
michigan apple 4
michigan apricot 1
michigan orange 2
michigan pear 1
michigan pear 1
michigan pear 1
texas apple 4
texas banana 2
texas banana 2
texas banana 2
texas grape 1
我的小程序员大脑告诉我,要想办法用某种脚本循环数据,但稍微读一读SQL和数据库,你似乎不会用SQL编写长而慢的循环脚本。我甚至不确定你能不能?但是,在SQL中有更好/更快的方法来实现这一点
有人知道如何在SQL语句中计算和更新每一行的第三列,在这里称为popularity,对应于每个水果来自的状态数吗?感谢您的阅读,非常感谢您的帮助
到目前为止,我已经尝试了下面这些SQL语句,它们的输出并不能满足我的需要:
--outputs those fruits appearing multiple times in the table
SELECT fruit, COUNT(*)
FROM table
GROUP BY fruit
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
--outputs those fruits appearing only once in the table
SELECT fruit, COUNT(*)
FROM table
GROUP BY fruit
HAVING COUNT(*) = 1
--outputs list of unique fruits in the table
SELECT COUNT (DISTINCT(fruit))
FROM table
如果您的桌子是水果
计算每个水果的不同状态
select fruit, COUNT(distinct state) statecount from #fruit group by fruit
用这些值更新表
update #fruit
set popularity
= statecount
from
#fruit
inner join
(select fruit, COUNT(distinct state) statecount from #fruit group by fruit) sc
on #fruit.fruit = sc.fruit
这应该能让你走到那里。基本上,您希望获得水果所处的不同状态的计数,然后使用该计数将其连接回原始表
update table
set count = cnt
from
(
select fruit, count(distinct state) as cnt
from table
group by fruit) cnts
inner join table t
on cnts.fruit = t.fruit
如果您只想使用优先级更新表,它将如下所示:
update my_table x
set popularity = ( select count(distinct state)
from my_table
where fruit = x.fruit )
如果要选择数据,则可以使用分析查询:
select state, fruit
, count(distinct state) over ( partition by fruit ) as popularity
from my_table
这提供了每个水果的不同状态数。另一个选项:
SELECT fruit
, COUNT(*)
FROM
(
SELECT state
, fruit
, ROW_NUMBER() OVER (PARTITION BY state, fruit ORDER BY NULL) rn
FROM t
)
WHERE rn = 1
GROUP BY fruit
ORDER BY fruit;
我运行了这个,得到了(我认为)你想要的:
WITH t
AS (SELECT 'hawaii' as STATE, 'apple' as fruit FROM dual
UNION ALL
SELECT 'hawaii' as STATE, 'apple' as fruit FROM dual
UNION ALL
SELECT 'hawaii' as STATE, 'banana' as fruit FROM dual
UNION ALL
SELECT 'hawaii' as STATE, 'kiwi' as fruit FROM dual
UNION ALL
SELECT 'hawaii' as STATE, 'kiwi' as fruit FROM dual
UNION ALL
SELECT 'hawaii' as STATE, 'mango' as fruit FROM dual
UNION ALL
SELECT 'florida' as STATE, 'apple' as fruit FROM dual
UNION ALL
SELECT 'florida' as STATE, 'apple' as fruit FROM dual
UNION ALL
SELECT 'florida' as STATE, 'apple' as fruit FROM dual
UNION ALL
SELECT 'florida' as STATE, 'orange' as fruit FROM dual
UNION ALL
SELECT 'michigan' as STATE, 'apple' as fruit FROM dual
UNION ALL
SELECT 'michigan' as STATE, 'apple' as fruit FROM dual
UNION ALL
SELECT 'michigan' as STATE, 'apricot' as fruit FROM dual
UNION ALL
SELECT 'michigan' as STATE, 'orange' as fruit FROM dual
UNION ALL
SELECT 'michigan' as STATE, 'pear' as fruit FROM dual
UNION ALL
SELECT 'michigan' as STATE, 'pear' as fruit FROM dual
UNION ALL
SELECT 'michigan' as STATE, 'pear' as fruit FROM dual
UNION ALL
SELECT 'texas' as STATE, 'apple' as fruit FROM dual
UNION ALL
SELECT 'texas' as STATE, 'banana' as fruit FROM dual
UNION ALL
SELECT 'texas' as STATE, 'banana' as fruit FROM dual
UNION ALL
SELECT 'texas' as STATE, 'banana' as fruit FROM dual
UNION ALL
SELECT 'texas' as STATE, 'grape' as fruit FROM dual)
SELECT state,
fruit,
count(DISTINCT state) OVER (PARTITION BY fruit) AS popularity
FROM t;
返回
florida apple 4
florida apple 4
florida apple 4
hawaii apple 4
hawaii apple 4
michigan apple 4
michigan apple 4
texas apple 4
michigan apricot 1
hawaii banana 2
texas banana 2
texas banana 2
texas banana 2
texas grape 1
hawaii kiwi 1
hawaii kiwi 1
hawaii mango 1
florida orange 2
michigan orange 2
michigan pear 1
michigan pear 1
显然,您只需要运行:
SELECT state,
fruit,
count(DISTINCT state) OVER (PARTITION BY fruit) AS popularity
FROM table_name;
希望它有帮助…试试这个:
select a.*,b.total
from [table] as a
left join
(
SELECT fruit,count(distinct [state]) as total
FROM [table]
group by fruit
) as b
on a.fruit = b.fruit
注意这是SQL Server代码,如果需要,请自行调整。试试这个
create table states([state] varchar(10),fruit varchar(10),popularity int)
INSERT INTO states([state],fruit)
VALUES('hawaii','apple'),
('hawaii','apple'),
('hawaii','banana'),
('hawaii','kiwi'),
('hawaii','kiwi'),
('hawaii','mango'),
('florida','apple'),
('florida','apple'),
('florida','apple'),
('florida','orange'),
('michigan','apple'),
('michigan','apple'),
('michigan','apricot'),
('michigan','orange'),
('michigan','pear'),
('michigan','pear'),
('michigan','pear'),
('texas','apple'),
('texas','banana'),
('texas','banana'),
('texas','banana'),
('texas','grape')
update t set t.popularity=a.cnt
from states t inner join
(SELECT fruit,count(distinct [state]) as cnt
FROM states
group by fruit) a
on t.fruit =a.fruit
这是一个非常糟糕的数据模型。聚合(例如在示例中的流行度)应该按需计算,或者如果数据量和性能要求认为有必要的话,在分离的数据结构(如物化视图)中保存。要理解为什么它是一个坏数据模型,请考虑这种情况。删除一行{夏威夷,新西兰}。同时,在另一届会议上,我插入两行{明尼苏达,新西兰}和{纽约,新西兰}。猕猴桃受欢迎的正确价值是什么?当我们都试图更新所有其他行时会发生什么呢?@yaroslav-SQL语句中CASE的使用取决于个人品味(或组织标准)。最初提供的代码是完全有效的。@APC,我不明白,你说的是哪种情况?我没有看到任何案件的原始代码,我有双重检查只是以防万一。当我编辑代码时,我不删除任何内容,只是改进formatting@APC这可以,而且已经在每个tSQL论坛、博客等上讨论了很多次。在我和其他一些人看来,大写关键字更具可读性。但感谢您指出这一点,我相当肯定这不是Oracle语法。这只返回列FROUT。然后只有一个随机选择的行。那么把它变成一个更新<代码>更新t集受欢迎程度=(选择x.qty FROM(选择水果,计数(*)数量FROM(选择状态,水果,行号()超过(按状态划分,水果顺序为空)rn FROM t),其中rn=1按水果分组)x其中x.FROUT=t.FROUT)代码>这个版本适合我。我很确定这不是Oracle语法。非常感谢!!!您提供的更新解决方案在我的小练习(水果)表上非常有效,但是,当我使用包含数百万行的实际表尝试此方法时,它花费了一个多小时,但仍未完成,我猜每行的更新成本相当高?您提供的第二个SELECT解决方案在创建表new_TABLE as(SELECT…)之后使用时效果非常好,速度也快得多。我猜创建一个新表对于SQL来说是一个糟糕的做法,但是在我的表上它比更新选项的速度要快得多。这是个好主意!