在MySQL表中按组选择第一行和最后一行,在组中有细微差别

在MySQL表中按组选择第一行和最后一行,在组中有细微差别,mysql,Mysql,我的研究团队每分钟都会收集一次微型移动自行车/滑板车的信息(实际反馈每3-5分钟更新一次),提供有关自行车/滑板车位置的信息。每个非重复记录都存储在“freeBikeStatus”表中: CREATE TABLE `freeBikeStatus` ( `bike_id` varchar(255) NOT NULL, `name` varchar(255) DEFAULT NULL, `lon` double DEFAULT NULL, `lat` double DEFAULT N

我的研究团队每分钟都会收集一次微型移动自行车/滑板车的信息(实际反馈每3-5分钟更新一次),提供有关自行车/滑板车位置的信息。每个非重复记录都存储在“freeBikeStatus”表中:

CREATE TABLE `freeBikeStatus` (
  `bike_id` varchar(255) NOT NULL,
  `name` varchar(255) DEFAULT NULL,
  `lon` double DEFAULT NULL,
  `lat` double DEFAULT NULL,
  `is_reserved` bigint(20) DEFAULT NULL,
  `is_disabled` bigint(20) DEFAULT NULL,
  `soc` double DEFAULT NULL,
  `provider` varchar(255) DEFAULT NULL,
  `system_name` varchar(255) NOT NULL,
  `timestamp` bigint(20) NOT NULL,
  `vehicle_type` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`bike_id`,`system_name`,`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
然而,有大量冗余信息,因为自行车本身可能不会移动,但仍会以不同的时间戳记录在多行中。这可能会将数据库的大小增加40-50倍,因此需要一个查询来减少大小并消除冗余信息。例如:

==============================================
| row | bikeid | lat | lon | timestamp | ... |
==============================================
|  1  |   a    |  X  |  Y  |   1:01    | ... |
|  2  |   a    |  X  |  Y  |   1:03    | ... |
|  3  |   a    |  X  |  Y  |   1:05    | ... |
|  4  |   a    |  X  |  Y  |   1:08    | ... |
|  5  |   a    |  Z  |  Y  |   1:12    | ... |
|  6  |   a    |  Z  |  Y  |   1:15    | ... |
|  7  |   a    |  Z  |  Y  |   1:17    | ... |
|  8  |   a    |  Z  |  Y  |   1:19    | ... |
|  9  |   a    |  X  |  Y  |   1:22    | ... |
| 10  |   a    |  X  |  Y  |   1:25    | ... |
| 11  |   a    |  X  |  Y  |   1:27    | ... |
| 12  |   a    |  X  |  Y  |   1:29    | ... |
由于自行车实际上没有从1:01移动到1:08,也没有从1:12移动到1:19,也没有从1:22移动到1:29,因此不需要中间的行。因此,我们希望将上表更改为下表:

==============================================
| row | bikeid | lat | lon | timestamp | ... |
==============================================
|  1  |   a    |  X  |  Y  |   1:01    | ... |
|  4  |   a    |  X  |  Y  |   1:08    | ... |
|  5  |   a    |  Z  |  Y  |   1:12    | ... |
|  8  |   a    |  Z  |  Y  |   1:19    | ... |
|  9  |   a    |  X  |  Y  |   1:22    | ... |
| 12  |   a    |  X  |  Y  |   1:29    | ... |
我根据一个类似的StackOverFlow问题提出了以下查询()

然而,这个示例中似乎有一点不同(导致上面的查询)。如果自行车返回到相同的纬度/经度,我相信查询将消除所有时间点,即使它从A->B->A点移动,这将删除一个端点和一个起点。有没有办法修改此查询以将其考虑在内?

使用
LAG()
LEAD()
窗口函数检查每行
lat
lon
的上一个和下一个值:

with cte as (
  select *,
    lag(lat) over (partition by bikeid order by timestamp) prev_lat,
    lead(lat) over (partition by bikeid order by timestamp) next_lat,
    lag(lon) over (partition by bikeid order by timestamp) prev_lon,
    lead(lon) over (partition by bikeid order by timestamp) next_lon
  from freeBikeStatus  
)
select `row`, bikeid, lat, lon, timestamp
from cte
where 
     (lat, lon) <> (prev_lat, prev_lon) 
  or (lat, lon) <> (next_lat, next_lon)
  or coalesce(prev_lat, prev_lon) is null
  or coalesce(next_lat, next_lon) is null
order by `row`

只是好奇。保留了什么值,并且禁用了?
with cte as (
  select *,
    lag(lat) over (partition by bikeid order by timestamp) prev_lat,
    lead(lat) over (partition by bikeid order by timestamp) next_lat,
    lag(lon) over (partition by bikeid order by timestamp) prev_lon,
    lead(lon) over (partition by bikeid order by timestamp) next_lon
  from freeBikeStatus  
)
select `row`, bikeid, lat, lon, timestamp
from cte
where 
     (lat, lon) <> (prev_lat, prev_lon) 
  or (lat, lon) <> (next_lat, next_lon)
  or coalesce(prev_lat, prev_lon) is null
  or coalesce(next_lat, next_lon) is null
order by `row`
| row | bikeid | lat | lon | timestamp |
| --- | ------ | --- | --- | --------- |
| 1   | a      | X   | Y   | 1:01      |
| 4   | a      | X   | Y   | 1:08      |
| 5   | a      | Z   | Y   | 1:12      |
| 8   | a      | Z   | Y   | 1:19      |
| 9   | a      | X   | Y   | 1:22      |
| 12  | a      | X   | Y   | 1:29      |