MySQL:groupby如何处理没有聚合函数的列?
我对mysql中的MySQL:groupby如何处理没有聚合函数的列?,mysql,group-by,Mysql,Group By,我对mysql中的groupby命令的工作原理有些困惑 假设我有一张桌子: mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort; +----------+-----------------+---------------------+--------------------------------------------
groupby
命令的工作原理有些困惑
假设我有一张桌子:
mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort;
+----------+-----------------+---------------------+-------------------------------------------------+
| recordID | IPAddress | date | httpMethod |
+----------+-----------------+---------------------+-------------------------------------------------+
| 1 | 64.68.88.22 | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0 |
| 2 | 64.68.88.166 | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0 |
| 3 | 129.173.177.214 | 2003-07-09 00:01:23 | GET / HTTP/1.1 |
| 4 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /include/fcs_style.css HTTP/1.1 |
| 5 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /include/main_page.css HTTP/1.1 |
| 6 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /images/bigportaltopbanner.gif HTTP/1.1 |
| 7 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /images/right_1.jpg HTTP/1.1 |
| 8 | 64.68.88.165 | 2003-07-09 00:02:43 | GET /studentservices/responsible.shtml HTTP/1.0 |
| 9 | 64.68.88.165 | 2003-07-09 00:02:44 | GET /news/sports/basketball.shtml HTTP/1.0 |
| 10 | 64.68.88.34 | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0 |
| 11 | 129.173.159.98 | 2003-07-09 00:03:46 | GET / HTTP/1.1 |
| 12 | 129.173.159.98 | 2003-07-09 00:03:46 | GET /include/fcs_style.css HTTP/1.1 |
| 13 | 129.173.159.98 | 2003-07-09 00:03:46 | GET /include/main_page.css HTTP/1.1 |
| 14 | 129.173.159.98 | 2003-07-09 00:03:48 | GET /images/bigportaltopbanner.gif HTTP/1.1 |
| 15 | 129.173.159.98 | 2003-07-09 00:03:48 | GET /images/left_1g.jpg HTTP/1.1 |
| 16 | 129.173.159.98 | 2003-07-09 00:03:48 | GET /images/webcam.gif HTTP/1.1 |
+----------+-----------------+---------------------+-------------------------------------------------+
当我执行此语句时,它如何选择要包括的
recordID
,因为存在一个正确的recordID
s范围?它只是选择第一个匹配的吗
mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort GROUP BY IPADDRESS;
+----------+-----------------+---------------------+-------------------------------------------------+
| recordID | IPAddress | date | httpMethod |
+----------+-----------------+---------------------+-------------------------------------------------+
| 11 | 129.173.159.98 | 2003-07-09 00:03:46 | GET / HTTP/1.1 |
| 3 | 129.173.177.214 | 2003-07-09 00:01:23 | GET / HTTP/1.1 |
| 8 | 64.68.88.165 | 2003-07-09 00:02:43 | GET /studentservices/responsible.shtml HTTP/1.0 |
| 2 | 64.68.88.166 | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0 |
| 1 | 64.68.88.22 | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0 |
| 10 | 64.68.88.34 | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0 |
+----------+-----------------+---------------------+-------------------------------------------------+
6 rows in set (0.00 sec)
对于这个表,
max(date)
和min(date)
的值对我来说似乎是合乎逻辑的,但是我不清楚recordID
和httpMethod
是如何选择的
在一个命令中使用两个聚合函数安全吗
mysql> select recordID, IPAddress, min(date), max(date), httpMethod from Log_Analysis_Records_dalhousieShort GROUP BY IPADDRESS;
+----------+-----------------+---------------------+---------------------+-------------------------------------------------+
| recordID | IPAddress | min(date) | max(date) | httpMethod |
+----------+-----------------+---------------------+---------------------+-------------------------------------------------+
| 11 | 129.173.159.98 | 2003-07-09 00:03:46 | 2003-07-09 00:03:48 | GET / HTTP/1.1 |
| 3 | 129.173.177.214 | 2003-07-09 00:01:23 | 2003-07-09 00:01:23 | GET / HTTP/1.1 |
| 8 | 64.68.88.165 | 2003-07-09 00:02:43 | 2003-07-09 00:02:44 | GET /studentservices/responsible.shtml HTTP/1.0 |
| 2 | 64.68.88.166 | 2003-07-09 00:00:55 | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0 |
| 1 | 64.68.88.22 | 2003-07-09 00:00:21 | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0 |
| 10 | 64.68.88.34 | 2003-07-09 00:02:46 | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0 |
+----------+-----------------+---------------------+---------------------+-------------------------------------------------+
6 rows in set (0.00 sec)
在没有聚合函数的select表达式中列出字段时,通常使用GROUP BY是无效的SQL,应该引发错误 然而,MySQL允许这样做,只是随机选择一个值。尽量避免它,因为它令人困惑 要禁止此操作,可以在运行时说:
SET sql\u mode:=CONCAT('ONLY\u FULL\u GROUP\u BY',@@sql\u mode)代码>
或者使用配置值和/或命令行选项sql模式
是的,列出两个聚合函数是完全有效的。我认为它根据主键或任何索引获取第一行,因为它看起来是这样工作的,但我尝试了对不同表进行分组查询,但没有识别任何模式
因此,我将避免使用非分组列的任何值。因为我是新来的,显然我无法发布有用的图像,所以我将尝试使用文本
我刚刚对此进行了测试,似乎不在GROUPBY中的字段的值将使用与GROUPBY条件匹配的第一行的值。这也将解释其他人在选择不在GROUPBY子句中的列时所感受到的“随机性”
例如:
创建一个名为“test”的表,其中有两列名为“col1”和“col2”,数据如下所示:
Col1 Col2
1.2
1.2
13
2.1
2
2.3
31
3.2
3
然后运行以下查询:
按col2desc从测试中选择col1、col2
您将得到以下结果:
13
2.3
3.3
1.2
1.2
2
3.2
2.1
31
现在考虑下面的查询:
选择groupTable.col1、groupTable.col2
从(
选择col1、col2
来自测试
按列订购2desc
)分组表
按groupTable.col1分组
按groupTable.col1 desc排序
您将得到以下结果:
3
2.3
13
将子查询更改为asc:
按col2asc从测试中选择col1、col2
结果:
21
31
1 2
1.2
2
3.2
13
2.3
3
再次将其用作子查询的基础:
选择groupTable.col1、groupTable.col2
从(
选择col1、col2
来自测试
col2订单asc
)分组表
按groupTable.col1分组
按groupTable.col1 desc排序
结果:
31
2.1
1.2
现在,您应该能够看到子查询的顺序如何影响为已选择但不在GROUPBY子句中的字段选择的值。这可以解释其他人提到的感知“随机性”,因为如果子查询(或缺少)没有与ORDER BY子句组合,那么mysql将在它们进入时抓取行,但是通过在子查询中定义排序顺序,您可以控制此行为并获得可预测的结果 Group By根据索引拾取第一条记录。假设Log\u Analysis\u Records\u dalhousieShort表已重新编码为索引。因此,group by从recordID 11到16中为IPAddress 129.173.159.98选择了11个recordID。但是,最小值和最大值是按操作预先分组的,因此这些值是为您逻辑计算的
mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort GROUP BY IPADDRESS;
+----------+-----------------+---------------------+-------------------------------------------------+
| recordID | IPAddress | date | httpMethod |
+----------+-----------------+---------------------+-------------------------------------------------+
| 11 | 129.173.159.98 | 2003-07-09 00:03:46 | GET / HTTP/1.1 |
| 3 | 129.173.177.214 | 2003-07-09 00:01:23 | GET / HTTP/1.1 |
| 8 | 64.68.88.165 | 2003-07-09 00:02:43 | GET /studentservices/responsible.shtml HTTP/1.0 |
| 2 | 64.68.88.166 | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0 |
| 1 | 64.68.88.22 | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0 |
| 10 | 64.68.88.34 | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0 |
+----------+-----------------+---------------------+-------------------------------------------------+
6 rows in set (0.00 sec)
要明确的是,这实际上不是随机的。如果对不变的表重复执行相同的查询,每次都会得到相同的结果。但是更改表(即使以其他方式在查询结果中不可见的方式)可能会导致出现不同的值。是否有方法可以将mysql设置为严格模式或类似的方式来禁止此行为?@VoteyDisciple我应该编写“undefined”。是的,这是可能的。编辑了答案。您可以使用set SESSION SQL_mode='traditional'将SQL模式设置为“传统”代码>