Sql 查找重复订单(按时间接近度)
我有一张我知道有重复的订单表Sql 查找重复订单(按时间接近度),sql,sql-server,Sql,Sql Server,我有一张我知道有重复的订单表 customer order_number order_date ---------- ------------ ------------------- 1 1 2012-03-01 01:58:00 1 2 2012-03-01 02:01:00 1 3 2012-03-01 02:
customer order_number order_date
---------- ------------ -------------------
1 1 2012-03-01 01:58:00
1 2 2012-03-01 02:01:00
1 3 2012-03-01 02:03:00
2 4 2012-03-01 02:15:00
3 5 2012-03-01 02:18:00
3 6 2012-03-01 04:30:00
4 7 2012-03-01 04:35:00
5 8 2012-03-01 04:38:00
6 9 2012-03-01 04:58:00
6 10 2012-03-01 04:59:00
我想在60分钟内找到同一客户的所有重复订单。要么是由“重复”行组成的结果集,要么是一组包含重复数量计数的所有客户
这是我试过的
SELECT
customer,
count(*)
FROM
orders
GROUP BY
customer,
DATEPART(HOUR, order_date)
HAVING (count(*) > 1)
如果两个副本之间的距离不超过60分钟,但在不同的时间(即1:58和2:02)内,则此功能不起作用
我也试过这个
SELECT
o1.customer,
o1.order_number,
o2.order_number,
DATEDIFF(MINUTE,o1.order_date, o2.order_date) AS [diff]
FROM
orders o1 LEFT OUTER JOIN
orders o2 ON o1.customer = o2.customer AND o1.order_number <> o2.order_number
WHERE
ABS(DATEDIFF(MINUTE,o1.order_date, o2.order_date)) < 60
现在,这给了我所有的副本,但它也给了我多行每重复订单。i、 o1,o2和o2,o1,如果没有多个重复的订单,情况也不会那么糟糕。在这些情况下,我得到o1,o2,o1,o3,o2,o1,o2,o3,o3,o1,o3,o2等,我得到所有的排列
有人有什么见解吗?我不一定要在这里寻找性能最好的答案,只需要一个有效的答案。可能是这样的: 测试数据: 质疑
使用EXISTS和相关子查询,您可以检查在过去一小时内是否有任何先前的订单。以下查询确定了彼此相隔60分钟的订单的所有可能排列:
DECLARE @orders TABLE (CustomerId INT, OrderId INT, OrderDate DATETIME)
INSERT INTO @orders
VALUES
(1, 1, '2012-03-01 01:58:00'),
(1, 2, '2012-03-01 02:01:00'),
(1, 3, '2012-03-01 02:03:00'),
(2, 4, '2012-03-01 02:15:00'),
(3, 5, '2012-03-01 02:18:00'),
(3, 6, '2012-03-01 04:30:00'),
(4, 7, '2012-03-01 04:35:00'),
(5, 8, '2012-03-01 04:38:00'),
(6, 9, '2012-03-01 04:58:00'),
(6, 10, '2012-03-01 04:59:00');
with ProximityOrderCascade(CustomerId, OrderId, ProximateOrderId, MinutesDifference, OrderDate, ProximateOrderDate)
as
(
select o.customerid, o.orderid, null, null, o.orderdate, o.orderdate
from @orders o
union all
select o.customerid, o.orderid, p.orderid, datediff(minute, p.OrderDate, o.OrderDate), o.OrderDate, p.OrderDate
from ProximityOrderCascade p
inner join @orders o
on p.customerid = o.customerid
and abs(datediff(minute, p.OrderDate, o.OrderDate)) between 0 and 60
and o.orderid <> p.orderid
where proximateorderid is null
)
select * from ProximityOrderCascade
where
not ProximateOrderId is null
你有一种级联依赖性。如果你的订单发生在'0、59、118、177、236等[全部59分钟前],你希望搜索结果是什么?@Dems有趣。我会考虑所有这些重复。不过,无论哪种情况,我都会对结果感到满意。不过,这只与第一个订单相比?那么在0,1,2,65,66,67分钟发生的订单呢?1,2,66,67的订单都是重复的,但这不是只能找到1,2吗?+1我正在根据我的真实数据进行尝试,谢谢你的回复。@Dems:我不明白你的意思。这将取客户组的最小值,然后减去当前日期时间。这也将以分钟为单位计算完整的日期时间。@Arion-如果有三个订单接近,则两个是重复的。如果一个多小时后有第四份订单,它就不是重复的。如果在第四个订单5分钟后有第五个订单,则为重复订单。但是,您的代码只比较第五阶和第一阶。这意味着它不会显示为副本。你打算怎么看第五个订单是第四个订单的复制品?@Dems-对。我没有考虑到这一点。你说得对。它只接受客户组中的第一个订单。假设没有两个订单同时发生。可以更改为而不是<和>=然后添加id
;WITH CTE
AS
(
SELECT
MIN(datediff(minute,'1990-1-1',order_date)) OVER(PARTITION BY customer) AS minDate,
datediff(minute,'1990-1-1',order_date) AS DateTicks,
tbl.customer
FROM
@tbl AS tbl
)
SELECT
CTE.customer,
SUM(CASE WHEN (CTE.DateTicks-CTE.minDate)<60 THEN 1 ELSE 0 END)
FROM
CTE
GROUP BY
CTE.customer
SELECT
*,
CASE WHEN EXISTS (SELECT *
FROM orders AS lookup
WHERE customer = orders.customer
AND order_date < orders.order_date
AND order_date >= DATEADD(hour, -1, order_date)
)
THEN 'Principle Order'
ELSE 'Duplicate Order'
END as Order_Status
FROM
orders
DECLARE @orders TABLE (CustomerId INT, OrderId INT, OrderDate DATETIME)
INSERT INTO @orders
VALUES
(1, 1, '2012-03-01 01:58:00'),
(1, 2, '2012-03-01 02:01:00'),
(1, 3, '2012-03-01 02:03:00'),
(2, 4, '2012-03-01 02:15:00'),
(3, 5, '2012-03-01 02:18:00'),
(3, 6, '2012-03-01 04:30:00'),
(4, 7, '2012-03-01 04:35:00'),
(5, 8, '2012-03-01 04:38:00'),
(6, 9, '2012-03-01 04:58:00'),
(6, 10, '2012-03-01 04:59:00');
with ProximityOrderCascade(CustomerId, OrderId, ProximateOrderId, MinutesDifference, OrderDate, ProximateOrderDate)
as
(
select o.customerid, o.orderid, null, null, o.orderdate, o.orderdate
from @orders o
union all
select o.customerid, o.orderid, p.orderid, datediff(minute, p.OrderDate, o.OrderDate), o.OrderDate, p.OrderDate
from ProximityOrderCascade p
inner join @orders o
on p.customerid = o.customerid
and abs(datediff(minute, p.OrderDate, o.OrderDate)) between 0 and 60
and o.orderid <> p.orderid
where proximateorderid is null
)
select * from ProximityOrderCascade
where
not ProximateOrderId is null
CustomerId OrderId ProximateOrderId MinutesDifference OrderDate ProximateOrderDate
----------- ----------- ---------------- ----------------- ----------------------- -----------------------
6 9 10 -1 2012-03-01 04:58:00.000 2012-03-01 04:59:00.000
6 10 9 1 2012-03-01 04:59:00.000 2012-03-01 04:58:00.000
1 1 3 -5 2012-03-01 01:58:00.000 2012-03-01 02:03:00.000
1 2 3 -2 2012-03-01 02:01:00.000 2012-03-01 02:03:00.000
1 1 2 -3 2012-03-01 01:58:00.000 2012-03-01 02:01:00.000
1 3 2 2 2012-03-01 02:03:00.000 2012-03-01 02:01:00.000
1 2 1 3 2012-03-01 02:01:00.000 2012-03-01 01:58:00.000
1 3 1 5 2012-03-01 02:03:00.000 2012-03-01 01:58:00.000
(8 row(s) affected)