用大链接表优化mysql分组

用大链接表优化mysql分组,mysql,group-by,many-to-many,Mysql,Group By,Many To Many,我已经读了很多关于这个的文章,但是每次查询都会有30多秒的时间,而我确信它的运行速度会快很多 问题是: 具有如下定义的大型链接表(4000万行,数据由650MB和表示1.8GB的索引组成): CREATE TABLE IF NOT EXISTS `glossary_entry_wordList_1` ( `idTerm` mediumint(8) unsigned NOT NULL, `idKeyword` mediumint(8) unsigned NOT NULL, `termL

我已经读了很多关于这个的文章,但是每次查询都会有30多秒的时间,而我确信它的运行速度会快很多

问题是:

具有如下定义的大型链接表(4000万行,数据由650MB和表示1.8GB的索引组成):

CREATE TABLE IF NOT EXISTS `glossary_entry_wordList_1` (
  `idTerm` mediumint(8) unsigned NOT NULL,
  `idKeyword` mediumint(8) unsigned NOT NULL,
  `termLength` smallint(6) NOT NULL,
  `termNumberWords` tinyint(4) NOT NULL,
  `termTransliteralRFC` mediumint(9) NOT NULL,
  `keywordLength` tinyint(3) unsigned NOT NULL,
  `termLanguage` tinyint(4) NOT NULL,
  PRIMARY KEY (`idKeyword`,`idTerm`),
  KEY `termTransliteralRFC` (`termTransliteralRFC`),
  KEY `termLength` (`termLength`),
  KEY `secondPrimary` (`idTerm`,`idKeyword`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci
CREATE TEMPORARY TABLE IF NOT EXISTS `foundIDs` (
  `searchId` int(11) NOT NULL,
  `searchedKeywordId` int(11) NOT NULL,
  `similarKeywordId` mediumint(8) unsigned NOT NULL,
  `partsMatched` tinyint(4) NOT NULL,
  `sumSimliarParts` int(11) NOT NULL,
  `keywordLength` int(11) NOT NULL,
  `fuzzyMark` float NOT NULL,
  `keywordDjb2` bigint(20) NOT NULL,
  `smallKeyword` tinyint(4) NOT NULL,
  PRIMARY KEY (`similarKeywordId`),
  KEY `searchId` (`searchId`),
  KEY `searchedKeywordId` (`searchedKeywordId`),
  KEY `partsMatched` (`partsMatched`),
  KEY `keywordLength` (`keywordLength`),
  KEY `smallKeyword` (`smallKeyword`),
  KEY `keywordDjb2` (`keywordDjb2`)
) ENGINE=MEMORY DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
和一个小型临时表,定义如下:

CREATE TABLE IF NOT EXISTS `glossary_entry_wordList_1` (
  `idTerm` mediumint(8) unsigned NOT NULL,
  `idKeyword` mediumint(8) unsigned NOT NULL,
  `termLength` smallint(6) NOT NULL,
  `termNumberWords` tinyint(4) NOT NULL,
  `termTransliteralRFC` mediumint(9) NOT NULL,
  `keywordLength` tinyint(3) unsigned NOT NULL,
  `termLanguage` tinyint(4) NOT NULL,
  PRIMARY KEY (`idKeyword`,`idTerm`),
  KEY `termTransliteralRFC` (`termTransliteralRFC`),
  KEY `termLength` (`termLength`),
  KEY `secondPrimary` (`idTerm`,`idKeyword`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci
CREATE TEMPORARY TABLE IF NOT EXISTS `foundIDs` (
  `searchId` int(11) NOT NULL,
  `searchedKeywordId` int(11) NOT NULL,
  `similarKeywordId` mediumint(8) unsigned NOT NULL,
  `partsMatched` tinyint(4) NOT NULL,
  `sumSimliarParts` int(11) NOT NULL,
  `keywordLength` int(11) NOT NULL,
  `fuzzyMark` float NOT NULL,
  `keywordDjb2` bigint(20) NOT NULL,
  `smallKeyword` tinyint(4) NOT NULL,
  PRIMARY KEY (`similarKeywordId`),
  KEY `searchId` (`searchId`),
  KEY `searchedKeywordId` (`searchedKeywordId`),
  KEY `partsMatched` (`partsMatched`),
  KEY `keywordLength` (`keywordLength`),
  KEY `smallKeyword` (`smallKeyword`),
  KEY `keywordDjb2` (`keywordDjb2`)
) ENGINE=MEMORY DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
我需要从
glossary\u entry\u wordList\u 1
中检索与表
foundIDs
中至少50%的
idKeyword
相关联的所有
idTerm

实际上,我需要找到所有包含x个单词的句子

为此,我使用这样的查询(请注意,这里的条件数据只是通过示例):

引擎的行为如下所示: -字长越小(1-2个字母),查询响应越长(显然,因为它们有更多的关联) -搜索表(FoundID)中的单词越多,查询越长

关于如何改进查询响应有什么想法吗


谢谢,

您能否确认
EXPLAIN
输出与您在问题中描述的查询和表格规格以及问题相匹配?据此,MySQL只需检查8x146行。或者我在解释我看到的东西时有困难吗?我相信你应该尝试其他方法来解决你的项目,也许MongoDB会帮助你解决问题。Sylvain,这确实是这种情况下的结果。我忘了提到FoundID只有4行,对于这个显式案例(搜索)。