MySQL:groupby如何处理没有聚合函数的列?

MySQL:groupby如何处理没有聚合函数的列?,mysql,group-by,Mysql,Group By,我对mysql中的groupby命令的工作原理有些困惑 假设我有一张桌子: mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort; +----------+-----------------+---------------------+--------------------------------------------

我对mysql中的
groupby
命令的工作原理有些困惑

假设我有一张桌子:

mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort;                   
+----------+-----------------+---------------------+-------------------------------------------------+
| recordID | IPAddress       | date                | httpMethod                                      |
+----------+-----------------+---------------------+-------------------------------------------------+
|        1 | 64.68.88.22     | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0         | 
|        2 | 64.68.88.166    | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0           | 
|        3 | 129.173.177.214 | 2003-07-09 00:01:23 | GET / HTTP/1.1                                  | 
|        4 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /include/fcs_style.css HTTP/1.1             | 
|        5 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /include/main_page.css HTTP/1.1             | 
|        6 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /images/bigportaltopbanner.gif HTTP/1.1     | 
|        7 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /images/right_1.jpg HTTP/1.1                | 
|        8 | 64.68.88.165    | 2003-07-09 00:02:43 | GET /studentservices/responsible.shtml HTTP/1.0 | 
|        9 | 64.68.88.165    | 2003-07-09 00:02:44 | GET /news/sports/basketball.shtml HTTP/1.0      | 
|       10 | 64.68.88.34     | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0          | 
|       11 | 129.173.159.98  | 2003-07-09 00:03:46 | GET / HTTP/1.1                                  | 
|       12 | 129.173.159.98  | 2003-07-09 00:03:46 | GET /include/fcs_style.css HTTP/1.1             | 
|       13 | 129.173.159.98  | 2003-07-09 00:03:46 | GET /include/main_page.css HTTP/1.1             | 
|       14 | 129.173.159.98  | 2003-07-09 00:03:48 | GET /images/bigportaltopbanner.gif HTTP/1.1     | 
|       15 | 129.173.159.98  | 2003-07-09 00:03:48 | GET /images/left_1g.jpg HTTP/1.1                | 
|       16 | 129.173.159.98  | 2003-07-09 00:03:48 | GET /images/webcam.gif HTTP/1.1                 | 
+----------+-----------------+---------------------+-------------------------------------------------+

当我执行此语句时,它如何选择要包括的
recordID
,因为存在一个正确的
recordID
s范围?它只是选择第一个匹配的吗

mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort GROUP BY IPADDRESS;
+----------+-----------------+---------------------+-------------------------------------------------+
| recordID | IPAddress       | date                | httpMethod                                      |
+----------+-----------------+---------------------+-------------------------------------------------+
|       11 | 129.173.159.98  | 2003-07-09 00:03:46 | GET / HTTP/1.1                                  | 
|        3 | 129.173.177.214 | 2003-07-09 00:01:23 | GET / HTTP/1.1                                  | 
|        8 | 64.68.88.165    | 2003-07-09 00:02:43 | GET /studentservices/responsible.shtml HTTP/1.0 | 
|        2 | 64.68.88.166    | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0           | 
|        1 | 64.68.88.22     | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0         | 
|       10 | 64.68.88.34     | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0          | 
+----------+-----------------+---------------------+-------------------------------------------------+
6 rows in set (0.00 sec)

对于这个表,
max(date)
min(date)
的值对我来说似乎是合乎逻辑的,但是我不清楚
recordID
httpMethod
是如何选择的

在一个命令中使用两个聚合函数安全吗

mysql> select recordID, IPAddress, min(date), max(date), httpMethod from Log_Analysis_Records_dalhousieShort GROUP BY IPADDRESS;
+----------+-----------------+---------------------+---------------------+-------------------------------------------------+
| recordID | IPAddress       | min(date)           | max(date)           | httpMethod                                      |
+----------+-----------------+---------------------+---------------------+-------------------------------------------------+
|       11 | 129.173.159.98  | 2003-07-09 00:03:46 | 2003-07-09 00:03:48 | GET / HTTP/1.1                                  | 
|        3 | 129.173.177.214 | 2003-07-09 00:01:23 | 2003-07-09 00:01:23 | GET / HTTP/1.1                                  | 
|        8 | 64.68.88.165    | 2003-07-09 00:02:43 | 2003-07-09 00:02:44 | GET /studentservices/responsible.shtml HTTP/1.0 | 
|        2 | 64.68.88.166    | 2003-07-09 00:00:55 | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0           | 
|        1 | 64.68.88.22     | 2003-07-09 00:00:21 | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0         | 
|       10 | 64.68.88.34     | 2003-07-09 00:02:46 | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0          | 
+----------+-----------------+---------------------+---------------------+-------------------------------------------------+
6 rows in set (0.00 sec)

在没有聚合函数的select表达式中列出字段时,通常使用GROUP BY是无效的SQL,应该引发错误

然而,MySQL允许这样做,只是随机选择一个值。尽量避免它,因为它令人困惑

要禁止此操作,可以在运行时说:

SET sql\u mode:=CONCAT('ONLY\u FULL\u GROUP\u BY',@@sql\u mode)

或者使用配置值和/或命令行选项
sql模式


是的,列出两个聚合函数是完全有效的。

我认为它根据主键或任何索引获取第一行,因为它看起来是这样工作的,但我尝试了对不同表进行分组查询,但没有识别任何模式


因此,我将避免使用非分组列的任何值。

因为我是新来的,显然我无法发布有用的图像,所以我将尝试使用文本

我刚刚对此进行了测试,似乎不在GROUPBY中的字段的值将使用与GROUPBY条件匹配的第一行的值。这也将解释其他人在选择不在GROUPBY子句中的列时所感受到的“随机性”

例如:

创建一个名为“test”的表,其中有两列名为“col1”和“col2”,数据如下所示:

Col1 Col2
1.2
1.2
13
2.1
2
2.3
31
3.2
3

然后运行以下查询:

按col2desc从测试中选择col1、col2

您将得到以下结果:

13
2.3
3.3
1.2
1.2
2
3.2
2.1
31

现在考虑下面的查询:

选择groupTable.col1、groupTable.col2
从(
选择col1、col2
来自测试
按列订购2desc
)分组表
按groupTable.col1分组
按groupTable.col1 desc排序

您将得到以下结果:

3
2.3
13

将子查询更改为asc:

按col2asc从测试中选择col1、col2

结果:

21
31
1 2
1.2
2
3.2
13
2.3
3

再次将其用作子查询的基础:

选择groupTable.col1、groupTable.col2
从(
选择col1、col2
来自测试
col2订单asc
)分组表
按groupTable.col1分组
按groupTable.col1 desc排序

结果:
31
2.1
1.2


现在,您应该能够看到子查询的顺序如何影响为已选择但不在GROUPBY子句中的字段选择的值。这可以解释其他人提到的感知“随机性”,因为如果子查询(或缺少)没有与ORDER BY子句组合,那么mysql将在它们进入时抓取行,但是通过在子查询中定义排序顺序,您可以控制此行为并获得可预测的结果

Group By根据索引拾取第一条记录。假设Log\u Analysis\u Records\u dalhousieShort表已重新编码为索引。因此,group by从recordID 11到16中为IPAddress 129.173.159.98选择了11个recordID。但是,最小值和最大值是按操作预先分组的,因此这些值是为您逻辑计算的

mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort GROUP BY IPADDRESS;
+----------+-----------------+---------------------+-------------------------------------------------+
| recordID | IPAddress       | date                | httpMethod                                      |
+----------+-----------------+---------------------+-------------------------------------------------+
|       11 | 129.173.159.98  | 2003-07-09 00:03:46 | GET / HTTP/1.1                                  | 
|        3 | 129.173.177.214 | 2003-07-09 00:01:23 | GET / HTTP/1.1                                  | 
|        8 | 64.68.88.165    | 2003-07-09 00:02:43 | GET /studentservices/responsible.shtml HTTP/1.0 | 
|        2 | 64.68.88.166    | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0           | 
|        1 | 64.68.88.22     | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0         | 
|       10 | 64.68.88.34     | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0          | 
+----------+-----------------+---------------------+-------------------------------------------------+
6 rows in set (0.00 sec)

要明确的是,这实际上不是随机的。如果对不变的表重复执行相同的查询,每次都会得到相同的结果。但是更改表(即使以其他方式在查询结果中不可见的方式)可能会导致出现不同的值。是否有方法可以将mysql设置为严格模式或类似的方式来禁止此行为?@VoteyDisciple我应该编写“undefined”。是的,这是可能的。编辑了答案。您可以使用
set SESSION SQL_mode='traditional'将SQL模式设置为“传统”