使用自引用查询更新mysql
我有一份调查表,其中包括(除其他外)以下各栏使用自引用查询更新mysql,mysql,self-join,Mysql,Self Join,我有一份调查表,其中包括(除其他外)以下各栏 survey_id - unique id user_id - the id of the person the survey relates to created - datetime ip_address - of the submission ip_count - the number of duplicates 由于记录集很大,动态运行此查询是不切实际的,因此尝试创建更新语句,定期将“缓存”结果存储在ip_计数中 ip_计
survey_id - unique id
user_id - the id of the person the survey relates to
created - datetime
ip_address - of the submission
ip_count - the number of duplicates
由于记录集很大,动态运行此查询是不切实际的,因此尝试创建更新语句,定期将“缓存”结果存储在ip_计数中
ip_计数的目的是显示同一用户id在12个月内(创建日期+/-6个月)收到的重复ip_地址调查提交的数量
使用以下数据集,这是预期的结果
survey_id user_id created ip_address ip_count #counted duplicates survey_id
1 1 01-Jan-12 123.132.123 1 # 2
2 1 01-Apr-12 123.132.123 2 # 1, 3
3 2 01-Jul-12 123.132.123 0 #
4 1 01-Aug-12 123.132.123 3 # 2, 6
6 1 01-Dec-12 123.132.123 1 # 4
这是迄今为止我提出的最接近的解决方案,但是这个查询没有考虑到日期限制,并且很难找到替代方法
UPDATE surveys
JOIN(
SELECT ip_address, created, user_id, COUNT(*) AS total
FROM surveys
WHERE surveys.state IN (1, 3) # survey is marked as completed and confirmed
GROUP BY ip_address, user_id
) AS ipCount
ON (
ipCount.ip_address = surveys.ip_address
AND ipCount.user_id = surveys.user_id
AND ipCount.created BETWEEN (surveys.created - INTERVAL 6 MONTH) AND (surveys.created + INTERVAL 6 MONTH)
)
SET surveys.ip_count = ipCount.total - 1 # minus 1 as this query will match on its own id.
WHERE surveys.ip_address IS NOT NULL # ignore surveys where we have no ip_address
提前感谢您的帮助:)我没有您的表,因此我很难形成正确的sql,但我可以尝试一下,希望能够帮助您 首先,我需要将调查的笛卡尔乘积与自身进行对比,并过滤掉我不想要的行
select s1.survey_id x, s2.survey_id y from surveys s1, surveys s2 where s1.survey_id != s2.survey_id and s1.ip_address = s2.ip_address and (s1.created and s2.created fall 6 months within each other)
此项的输出应包含匹配的每对调查(根据您的规则)两次(第一个位置的每个id一次,第二个位置的每个id一次)
然后我们可以对这个输出进行分组
,得到一个表,该表基本上为每个调查id提供了正确的ip计数
(select x, count(*) c from (select s1.survey_id x, s2.survey_id y from surveys s1, surveys s2 where s1.survey_id != s2.survey_id and s1.ip_address = s2.ip_address and (s1.created and s2.created fall 6 months within each other)) group by x)
现在我们有了一个表,将每个调查id映射到其正确的ip计数。要更新原始表,我们需要将其与此关联,并将值复制到
所以看起来应该是这样的
UPDATE surveys SET s.ip_count = n.c from surveys s inner join (ABOVE QUERY) n on s.survey_id = n.x
这里有一些伪代码,但我认为总体思路应该是可行的
我以前从未根据另一个查询的输出更新过表。。试图从这个问题中猜出正确的语法-
另外,如果我需要为自己的工作做类似的事情,我不会试图在一个查询中完成。。这将是一个难以维护的问题,并且可能存在内存/性能问题。最好让脚本逐行遍历表,在事务中的一行上更新,然后再移动到下一行。速度要慢得多,但更容易理解,而且数据库可能更轻松。对上面显示的内容进行了一些(非常)小的调整。再次感谢你
UPDATE surveys AS s
INNER JOIN (
SELECT x, count(*) c
FROM (
SELECT s1.id AS x, s2.id AS y
FROM surveys AS s1, surveys AS s2
WHERE s1.state IN (1, 3) # completed and verified
AND s1.id != s2.id # dont self join
AND s1.ip_address != "" AND s1.ip_address IS NOT NULL # not interested in blank entries
AND s1.ip_address = s2.ip_address
AND (s2.created BETWEEN (s1.created - INTERVAL 6 MONTH) AND (s1.created + INTERVAL 6 MONTH))
AND s1.user_id = s2.user_id # where completed for the same user
) AS ipCount
GROUP BY x
) n on s.id = n.x
SET s.ip_count = n.c
非常感谢你!在我看到这个答案之前,我开始摆弄临时桌子等。请参阅下面最后完成的查询,了解处于类似情况下的其他人。