Sql 如何为以下查询创建有效索引_Sql_Mysql

Sql 如何为以下查询创建有效索引

sql mysql

Sql 如何为以下查询创建有效索引,sql,mysql,Sql,Mysql,我已将web服务器访问日志转换为mysql表，如下所示： CREATE TABLE `access_log` ( `timestamp` int(11) NOT NULL default '0', `visitorid` int(11) default NULL, `url` int(11) default NULL, `params` int(11) default NULL, `status` smallint(3) NOT NULL default '0', `bytes` int(20)

我已将web服务器访问日志转换为mysql表，如下所示：

CREATE TABLE `access_log` (
`timestamp` int(11) NOT NULL default '0',
`visitorid` int(11) default NULL,
`url` int(11) default NULL,
`params` int(11) default NULL,
`status` smallint(3) NOT NULL default '0',
`bytes` int(20) NOT NULL default '0',
`referrer` int(11) default NULL,
`refparams` int(11) default NULL,
`useragentid` int(11) default NULL,
`keywords` int(11) default NULL,
`country` char(3) default '',
`crawl` int(1) NOT NULL default '0',
`sessionid` int(11) default NULL,
KEY `timestamp` (`timestamp`),
KEY `visitorid` (`visitorid`),
KEY `url` (`url`),
KEY `referrer` (`referrer`),
KEY `keywords` (`keywords`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 PACK_KEYS=1;

select url,
count(distinct visitorid) as visitors,
count(*) as hits 
from access_log where 
timestamp >=1270072800 and timestamp <=1272664799 
and crawl=0 
group by url order by visitors desc limit 100;

我有一个查询，该查询生成特定日期范围内最流行的页面报告，示例如下：

CREATE TABLE `access_log` (
`timestamp` int(11) NOT NULL default '0',
`visitorid` int(11) default NULL,
`url` int(11) default NULL,
`params` int(11) default NULL,
`status` smallint(3) NOT NULL default '0',
`bytes` int(20) NOT NULL default '0',
`referrer` int(11) default NULL,
`refparams` int(11) default NULL,
`useragentid` int(11) default NULL,
`keywords` int(11) default NULL,
`country` char(3) default '',
`crawl` int(1) NOT NULL default '0',
`sessionid` int(11) default NULL,
KEY `timestamp` (`timestamp`),
KEY `visitorid` (`visitorid`),
KEY `url` (`url`),
KEY `referrer` (`referrer`),
KEY `keywords` (`keywords`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 PACK_KEYS=1;

select url,
count(distinct visitorid) as visitors,
count(*) as hits 
from access_log where 
timestamp >=1270072800 and timestamp <=1272664799 
and crawl=0 
group by url order by visitors desc limit 100;

当表中有很多记录时，这个查询会变得非常慢

根据相对于表中记录总数的时间戳范围，优化器表示将使用“时间戳”或“url”键。但是，它总是提到“在哪里使用”；使用临时设备；使用filesort'

是否有任何方法可以创建一个组合索引，以缩短此查询的执行时间

我尝试了以下组合，但优化器似乎忽略了它们：

idxtimestamp、url、visitorid、爬网 idxurl、visitorid、爬网、时间戳如有任何建议或指点，我将不胜感激

谢谢

因此，您希望在给定的时间段内按流行程度对URL进行排名。URL上的复合索引，visitorid会给你人气。时间戳上的复合索引，url将为您提供期间访问的url。为什么不尝试这两个索引，并对内联视图进行连接，类似这样的操作不确定mysql中内联视图的确切语法：

       select distinct URL from log as Log1
                  where visitdatetime > x and visitdatetime< y


       join

       (select url, count(distinct visitorid) as DistinctVisitors
       from log
       group by url
        -- having count(distinct visitorid) > {some cutoff value greater than 1}
        -- try the composite index (url, visitorid, visitdate)
       having vistdate > x and visitdate < y 
       ) as Log2


       on Log1.url = log2.url

       order by DistinctVisitors desc

将访问日志划分为多个表，并仅对日期范围内的表运行此查询

使用每日/每周/每月预汇总的数据制作汇总表，以减少生成报告所需的数据量。因此，在导入当天的日志文件后，通过将时间戳划分为小时边界，然后再划分为日边界等方式来聚合数据。

关于时间戳的简单索引如何？表中已经有一个关于时间戳的简单索引，我正试图找到一些可以提高速度的方法，谢谢你为什么要创建一个单独的组合索引？你可以在查询前加上EXPLAIN，看看MySQL计划使用哪些索引，以及可能存在的瓶颈在哪里。@donnie我试图让它使用索引，而不是使用“使用where；使用临时设备；不幸的是，这种方法不起作用。由于时间戳不用于限制Log2查询部分，它基本上统计整个表的访问次数，因为Log1中的url可以出现在任何时间范围内，所以连接实际上没有任何作用。如果我将时间戳添加到log2，我基本上回到了开始的位置。另外，我事先不知道什么是合理的截止值，因此我需要排序并限制为100。谢谢你的想法！还有其他建议吗？关于log2内联视图的复合索引url、visitor、visitdate，让visitdate>x和visitdate1。单次访问URL可能很多；没有必要让他们参与这类活动。它们当然不受欢迎，对吧？另外，删除那些其他索引，这样它们就不是候选对象了。您可以随时重新创建它们。谢谢您的建议。我一定会尝试将表拆分为多个表。但是考虑一下，比如说数据已经被每月分成一个表，它仍然有一个八卦记录。现在我想从那个表中查询几天。我不能在几天内使用聚合，因为在这段时间内，sumdaily访问者与countdistinct访问者在实际时间内是不同的。在没有其他分区或预聚合的情况下，您认为有什么方法可以加快查询速度吗？i、一个更好的指数？我不认为有一个金弹答案的指数，只会使事情运行得漂亮。当然我看不到，因此我建议从另一个角度来解决这个问题。如果你将url、visitorid、count*聚合为每天的点击量，那么你就可以在某种程度上使用这些每日聚合，不是吗？i、 e.不要试图预先完全聚合，只需将细节的数量减少到所需的最低限度。当然，如果命中率在这个组合中总是只有1，那就没什么用了。。。即使如此，您也可以大大减少表的宽度，并获得更好的内存使用率，尽管看起来您已经做了一些努力来缩小表的范围。