Mysql 尝试优化按投票活动列出投票者的查询
我正在构建一个查询,根据选民在投票表700万条记录中的活动,列出选民表100万条记录中的选民。准则如下:Mysql 尝试优化按投票活动列出投票者的查询,mysql,optimization,join,subquery,Mysql,Optimization,Join,Subquery,我正在构建一个查询,根据选民在投票表700万条记录中的活动,列出选民表100万条记录中的选民。准则如下: CREATE TABLE IF NOT EXISTS `voters` ( `CountyEMSID` varchar(9) COLLATE utf8_unicode_ci NOT NULL, `LastName` varchar(30) COLLATE utf8_unicode_ci NOT NULL, `FirstName` varchar(30) COLLATE utf8_
CREATE TABLE IF NOT EXISTS `voters` (
`CountyEMSID` varchar(9) COLLATE utf8_unicode_ci NOT NULL,
`LastName` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
`FirstName` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
`MiddleInitial` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
`NameSuffix` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`HouseNumber` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
`HouseNumberSuffix` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
`ApartmentNumber` varchar(15) COLLATE utf8_unicode_ci NOT NULL,
`StreetName` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`City` varchar(40) COLLATE utf8_unicode_ci NOT NULL,
`Zip` varchar(5) COLLATE utf8_unicode_ci NOT NULL,
`ZipCode4` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`MailingAddress1` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`MailingAddress2` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`MailingAddress3` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`MailingAddress4` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`DOBY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`DOBM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`DOBD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`Gender` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
`Party` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
`Other` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
`ED` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
`AD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`CD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`CO` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`SD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`CC` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`JD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`RegY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`RegM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`RegD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`Status` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`VoterType` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
`StatusChangeY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`StatusChangeM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`StatusChangeD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`LastVoted` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`Telephone` varchar(12) COLLATE utf8_unicode_ci NOT NULL,
`County` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
KEY `districts` (`CountyEMSID`,`ED`,`AD`,`CD`,`CO`,`SD`,`CC`,`JD`),
KEY `vsn` (`CountyEMSID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE IF NOT EXISTS `votes` (
`CountyEMSID` varchar(9) COLLATE utf8_unicode_ci NOT NULL,
`County` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`AD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`ED` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
`Party` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
`ElectionDateY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`ElectionDateM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`ElectionDateD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`ElectionType` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`VoterType` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
KEY `CountyEMSID` (`CountyEMSID`),
KEY `perfect` (`CountyEMSID`,`ElectionDateY`,`ElectionType`),
KEY `CountyEMSID_2` (`CountyEMSID`,`ElectionDateY`,`ElectionType`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
大选每年只举行一次,只有2004年当天或之后的大选才应计算在内
在前面提到的GEs中,只有10%到50%的选民投票的GEs才应该被计算在内
一些不太重要的信息:
无法更改架构。它以固定宽度文本文件的形式呈现给我们,通过脚本上传,并用于其他目的
只有当前的活动投票者列表及其投票历史记录可用。在我下面的查询中,我加入了一个等式,即每年减少1名选民时,上限会减少10000名选民。它并不完美,但它似乎可以过滤掉不需要的GEs,同时保留有效的GEs
例如,如果在2005年、2006年、2007年、2009年、2010年和2011年有10万到50万选民投票,那么我只希望列出在这些年投票的选民
模式如下:
CREATE TABLE IF NOT EXISTS `voters` (
`CountyEMSID` varchar(9) COLLATE utf8_unicode_ci NOT NULL,
`LastName` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
`FirstName` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
`MiddleInitial` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
`NameSuffix` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`HouseNumber` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
`HouseNumberSuffix` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
`ApartmentNumber` varchar(15) COLLATE utf8_unicode_ci NOT NULL,
`StreetName` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`City` varchar(40) COLLATE utf8_unicode_ci NOT NULL,
`Zip` varchar(5) COLLATE utf8_unicode_ci NOT NULL,
`ZipCode4` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`MailingAddress1` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`MailingAddress2` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`MailingAddress3` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`MailingAddress4` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`DOBY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`DOBM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`DOBD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`Gender` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
`Party` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
`Other` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
`ED` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
`AD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`CD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`CO` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`SD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`CC` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`JD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`RegY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`RegM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`RegD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`Status` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`VoterType` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
`StatusChangeY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`StatusChangeM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`StatusChangeD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`LastVoted` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`Telephone` varchar(12) COLLATE utf8_unicode_ci NOT NULL,
`County` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
KEY `districts` (`CountyEMSID`,`ED`,`AD`,`CD`,`CO`,`SD`,`CC`,`JD`),
KEY `vsn` (`CountyEMSID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE IF NOT EXISTS `votes` (
`CountyEMSID` varchar(9) COLLATE utf8_unicode_ci NOT NULL,
`County` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`AD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`ED` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
`Party` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
`ElectionDateY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`ElectionDateM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`ElectionDateD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`ElectionType` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
`VoterType` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
KEY `CountyEMSID` (`CountyEMSID`),
KEY `perfect` (`CountyEMSID`,`ElectionDateY`,`ElectionType`),
KEY `CountyEMSID_2` (`CountyEMSID`,`ElectionDateY`,`ElectionType`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
到目前为止,我有下面的查询,它应该只列出投票表中投票者的唯一ID CountyEMSID。它在mysqlfiddle中工作,但在phpmyadmin中挂起
SELECT DISTINCT CountyEMSID
FROM `votes`
WHERE ElectionDateY IN
(
SELECT ElectionDateY
FROM `votes`
WHERE ElectionType = 'GE'
AND ElectionDateY >= 2004
GROUP BY ElectionDateY
HAVING COUNT(*) < ((0.5 * (SELECT COUNT(*) FROM `voters`)) - ((YEAR(CURRENT_TIMESTAMP()) - ElectionDateY) * 10000))
AND COUNT(*) > (0.1 * (SELECT COUNT(*) FROM `voters`))
)
如果您能帮助我优化此查询,并对其进行修改,使其返回投票表中所有相应的投票人信息,我将不胜感激。MySQL在子句中的优化效果非常差。基本上,它会为处理的每一行重新运行子查询。您应该将计算移到from子句中。以下是我的尝试:
select distinct v.*
from votes v join
(select electiondatey, count(*) as NumYVotes
from votes v
group by electiondatey
) ey
on v.electiondatey = ev.electiondatey cross join
(select count(*) as numvoters from voters) as const
where (NumYVotes < 0.5 * numvoters - year(now()) - ElectionDateY * 10000) and
(NumYVotes > 0.1 * numvoters)
注意:我还没有测试过它,所以它可能有语法错误。您确实需要了解规范化。你的选民表是可怕的非规范化。4个邮寄地址?DOB有3列?我知道,这是PITA,但我不能更改模式。我在帖子中提到了原因。同意njk关于非规范化数据的观点。。无论如何,你能告诉我们根据你的小提琴,你的预期结果是什么吗=@bonCodigo基于fiddle,结果应该是一个独特的countyEMSID列表,对应于子查询返回的年份列表,即举行大选的年份列表,投票率在10%到50%之间。理想情况下,应该有100000到500000条记录。感谢您的努力,我完全可以利用这些信息更接近我所需要的。