Mysql 尝试优化按投票活动列出投票者的查询

Mysql 尝试优化按投票活动列出投票者的查询,mysql,optimization,join,subquery,Mysql,Optimization,Join,Subquery,我正在构建一个查询,根据选民在投票表700万条记录中的活动,列出选民表100万条记录中的选民。准则如下: CREATE TABLE IF NOT EXISTS `voters` ( `CountyEMSID` varchar(9) COLLATE utf8_unicode_ci NOT NULL, `LastName` varchar(30) COLLATE utf8_unicode_ci NOT NULL, `FirstName` varchar(30) COLLATE utf8_

我正在构建一个查询,根据选民在投票表700万条记录中的活动,列出选民表100万条记录中的选民。准则如下:

CREATE TABLE IF NOT EXISTS `voters` (
  `CountyEMSID` varchar(9) COLLATE utf8_unicode_ci NOT NULL,
  `LastName` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
  `FirstName` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
  `MiddleInitial` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
  `NameSuffix` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `HouseNumber` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
  `HouseNumberSuffix` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
  `ApartmentNumber` varchar(15) COLLATE utf8_unicode_ci NOT NULL,
  `StreetName` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `City` varchar(40) COLLATE utf8_unicode_ci NOT NULL,
  `Zip` varchar(5) COLLATE utf8_unicode_ci NOT NULL,
  `ZipCode4` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `MailingAddress1` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `MailingAddress2` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `MailingAddress3` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `MailingAddress4` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `DOBY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `DOBM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `DOBD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `Gender` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
  `Party` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
  `Other` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
  `ED` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
  `AD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `CD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `CO` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `SD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `CC` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `JD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `RegY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `RegM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `RegD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `Status` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `VoterType` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
  `StatusChangeY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `StatusChangeM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `StatusChangeD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `LastVoted` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `Telephone` varchar(12) COLLATE utf8_unicode_ci NOT NULL,
  `County` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  KEY `districts` (`CountyEMSID`,`ED`,`AD`,`CD`,`CO`,`SD`,`CC`,`JD`),
  KEY `vsn` (`CountyEMSID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS `votes` (
  `CountyEMSID` varchar(9) COLLATE utf8_unicode_ci NOT NULL,
  `County` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `AD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `ED` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
  `Party` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
  `ElectionDateY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `ElectionDateM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `ElectionDateD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `ElectionType` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `VoterType` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
  KEY `CountyEMSID` (`CountyEMSID`),
  KEY `perfect` (`CountyEMSID`,`ElectionDateY`,`ElectionType`),
  KEY `CountyEMSID_2` (`CountyEMSID`,`ElectionDateY`,`ElectionType`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
大选每年只举行一次,只有2004年当天或之后的大选才应计算在内

在前面提到的GEs中,只有10%到50%的选民投票的GEs才应该被计算在内

一些不太重要的信息:

无法更改架构。它以固定宽度文本文件的形式呈现给我们,通过脚本上传,并用于其他目的

只有当前的活动投票者列表及其投票历史记录可用。在我下面的查询中,我加入了一个等式,即每年减少1名选民时,上限会减少10000名选民。它并不完美,但它似乎可以过滤掉不需要的GEs,同时保留有效的GEs

例如,如果在2005年、2006年、2007年、2009年、2010年和2011年有10万到50万选民投票,那么我只希望列出在这些年投票的选民

模式如下:

CREATE TABLE IF NOT EXISTS `voters` (
  `CountyEMSID` varchar(9) COLLATE utf8_unicode_ci NOT NULL,
  `LastName` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
  `FirstName` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
  `MiddleInitial` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
  `NameSuffix` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `HouseNumber` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
  `HouseNumberSuffix` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
  `ApartmentNumber` varchar(15) COLLATE utf8_unicode_ci NOT NULL,
  `StreetName` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `City` varchar(40) COLLATE utf8_unicode_ci NOT NULL,
  `Zip` varchar(5) COLLATE utf8_unicode_ci NOT NULL,
  `ZipCode4` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `MailingAddress1` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `MailingAddress2` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `MailingAddress3` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `MailingAddress4` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `DOBY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `DOBM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `DOBD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `Gender` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
  `Party` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
  `Other` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
  `ED` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
  `AD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `CD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `CO` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `SD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `CC` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `JD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `RegY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `RegM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `RegD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `Status` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `VoterType` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
  `StatusChangeY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `StatusChangeM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `StatusChangeD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `LastVoted` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `Telephone` varchar(12) COLLATE utf8_unicode_ci NOT NULL,
  `County` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  KEY `districts` (`CountyEMSID`,`ED`,`AD`,`CD`,`CO`,`SD`,`CC`,`JD`),
  KEY `vsn` (`CountyEMSID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS `votes` (
  `CountyEMSID` varchar(9) COLLATE utf8_unicode_ci NOT NULL,
  `County` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `AD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `ED` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
  `Party` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
  `ElectionDateY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `ElectionDateM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `ElectionDateD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `ElectionType` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `VoterType` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
  KEY `CountyEMSID` (`CountyEMSID`),
  KEY `perfect` (`CountyEMSID`,`ElectionDateY`,`ElectionType`),
  KEY `CountyEMSID_2` (`CountyEMSID`,`ElectionDateY`,`ElectionType`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
到目前为止,我有下面的查询,它应该只列出投票表中投票者的唯一ID CountyEMSID。它在mysqlfiddle中工作,但在phpmyadmin中挂起

SELECT DISTINCT CountyEMSID
FROM `votes` 
WHERE ElectionDateY IN 
(
SELECT ElectionDateY
FROM `votes`
WHERE ElectionType = 'GE' 
AND ElectionDateY >= 2004 
GROUP BY ElectionDateY 
HAVING COUNT(*) < ((0.5 * (SELECT COUNT(*) FROM `voters`)) - ((YEAR(CURRENT_TIMESTAMP()) - ElectionDateY) * 10000)) 
AND COUNT(*) > (0.1 * (SELECT COUNT(*) FROM `voters`))
)
如果您能帮助我优化此查询,并对其进行修改,使其返回投票表中所有相应的投票人信息,我将不胜感激。

MySQL在子句中的优化效果非常差。基本上,它会为处理的每一行重新运行子查询。您应该将计算移到from子句中。以下是我的尝试:

select distinct v.*
from votes v join
     (select electiondatey, count(*) as NumYVotes
      from votes v
      group by electiondatey
    ) ey
    on v.electiondatey = ev.electiondatey cross join
    (select count(*) as numvoters from voters) as const
where (NumYVotes < 0.5 * numvoters - year(now()) - ElectionDateY * 10000) and
      (NumYVotes > 0.1 * numvoters)

注意:我还没有测试过它,所以它可能有语法错误。

您确实需要了解规范化。你的选民表是可怕的非规范化。4个邮寄地址?DOB有3列?我知道,这是PITA,但我不能更改模式。我在帖子中提到了原因。同意njk关于非规范化数据的观点。。无论如何,你能告诉我们根据你的小提琴,你的预期结果是什么吗=@bonCodigo基于fiddle,结果应该是一个独特的countyEMSID列表,对应于子查询返回的年份列表,即举行大选的年份列表,投票率在10%到50%之间。理想情况下,应该有100000到500000条记录。感谢您的努力,我完全可以利用这些信息更接近我所需要的。