Mysql 检查每个组是否存在不同的值

Mysql 检查每个组是否存在不同的值,mysql,sql,group-by,query-optimization,distinct,Mysql,Sql,Group By,Query Optimization,Distinct,编辑: 假设MySQL中有下表: CREATE TABLE `events` ( `pv_name` varchar(60) COLLATE utf8mb4_unicode_ci NOT NULL, `time_stamp` bigint(20) UNSIGNED NOT NULL, `value` text CHARACTER SET utf8mb4 COLLATE utf8mb4_bin, PRIMARY KEY (`pv_name`, `time_stamp`) ) ENGINE=Inno

编辑:

假设MySQL中有下表:

CREATE TABLE `events` (
`pv_name` varchar(60) COLLATE utf8mb4_unicode_ci NOT NULL,
`time_stamp` bigint(20) UNSIGNED NOT NULL,
`value` text CHARACTER SET utf8mb4 COLLATE utf8mb4_bin,
PRIMARY KEY (`pv_name`, `time_stamp`)
) ENGINE=InnoDB;
我可以使用以下查询在此表中找到每个具有多个不同值的pv_名称:

SELECT events.pv_name
FROM events
GROUP BY events.pv_name
HAVING COUNT(DISTINCT events.value) > 1;
select e.*
from events e
where e.time_stamp between $ts1 and $ts2 and
      exists (select 1
              from events e2
              where e2.pv_name = e.pv_name and
                    e2.time_stamp between $ts1 and $ts2 and
                    e2.event_id < e.event_id
             );
问题是这个查询效率不高。它统计所有不同的值,而不是在找到多个值后停止

一项建议是:

SELECT events.pv_name
FROM events
GROUP BY events.pv_name
HAVING MIN(events.value) < MAX(events.value);
如果索引包含值,则这是有效的。但是,值是一个文本列,因此不能


有没有其他方法可以提高搜索效率?可能是某种形式的相关子查询?我想继续使用MySQL,但是如果在另一个数据库服务器中有一个特性来帮助这个问题,我可能会考虑移到它。

< p>回答您的问题,最好避免分组或不同。不过,首先,我建议为表添加一个自动递增的event_id。这使得确定两行是否相同成为可能

因此,我建议提出以下问题:

SELECT events.pv_name
FROM events
GROUP BY events.pv_name
HAVING COUNT(DISTINCT events.value) > 1;
select e.*
from events e
where e.time_stamp between $ts1 and $ts2 and
      exists (select 1
              from events e2
              where e2.pv_name = e.pv_name and
                    e2.time_stamp between $ts1 and $ts2 and
                    e2.event_id < e.event_id
             );
您还需要索引:eventstime\u stamp、pv\u name、event\u id和eventspv\u name、time\u stamp、event\u id

这将查找成对的事件。您可以使用select distinct pv_名称。然而,这需要大量额外的处理来删除重复项

SELECT * FROM Customers WHERE pv_name IN
(SELECT pv_name FROM Customers GROUP BY pv_name HAVING COUNT(*) > 1) AND
 time_stamp BETWEEN 'start_time' and `end_time'

挑选* 来自客户 按pv_名称分组 具有MINtime\u戳记
这可能行得通。

我相信下面的方法行得通吗?可以改进吗

-- Chooses a single non null `value` from the `events` table for each `pv_name`.
CREATE TEMPORARY TABLE single_values ( PRIMARY KEY (pv_name) ) ENGINE=Memory AS (
SELECT events.pv_name, events.value
FROM events
WHERE events.value IS NOT NULL
GROUP BY events.pv_name );

-- Finds each `pv_name` that has a `value` different than the one for it in `single_values`.
-- This is a correlated subquery.
SELECT single_values.pv_name
FROM single_values
WHERE 1 = (
SELECT 1
FROM events
WHERE events.pv_name = single_values.pv_name
AND events.value <> single_values.value
AND events.value IS NOT NULL
LIMIT 1 );

插入多久发生一次?什么是及时性和准确性的要求?我没有一个好的固定数字为插入率。它可以在数据库允许的情况下以最快的速度运行。我不确定你所说的及时性准确性是什么意思?统计数据的准确性会有滞后吗。这种类型的信息排除或使不同的策略成为可能,通过具有MINtime\u stamp