Sql 查找重复订单(按时间接近度)

Sql 查找重复订单(按时间接近度),sql,sql-server,Sql,Sql Server,我有一张我知道有重复的订单表 customer order_number order_date ---------- ------------ ------------------- 1 1 2012-03-01 01:58:00 1 2 2012-03-01 02:01:00 1 3 2012-03-01 02:

我有一张我知道有重复的订单表

    customer   order_number   order_date
   ----------  ------------   -------------------
          1             1     2012-03-01 01:58:00
          1             2     2012-03-01 02:01:00
          1             3     2012-03-01 02:03:00
          2             4     2012-03-01 02:15:00
          3             5     2012-03-01 02:18:00
          3             6     2012-03-01 04:30:00
          4             7     2012-03-01 04:35:00
          5             8     2012-03-01 04:38:00
          6             9     2012-03-01 04:58:00
          6            10     2012-03-01 04:59:00
我想在60分钟内找到同一客户的所有重复订单。要么是由“重复”行组成的结果集,要么是一组包含重复数量计数的所有客户

这是我试过的

SELECT
   customer,
   count(*)
FROM
   orders
GROUP BY
   customer,
   DATEPART(HOUR, order_date)
HAVING (count(*) > 1)
如果两个副本之间的距离不超过60分钟,但在不同的时间(即1:58和2:02)内,则此功能不起作用

我也试过这个

SELECT
  o1.customer,
  o1.order_number,
  o2.order_number,
  DATEDIFF(MINUTE,o1.order_date, o2.order_date) AS [diff]
FROM
  orders o1 LEFT OUTER JOIN
  orders o2 ON o1.customer = o2.customer AND o1.order_number <> o2.order_number
WHERE
  ABS(DATEDIFF(MINUTE,o1.order_date, o2.order_date)) < 60
现在,这给了我所有的副本,但它也给了我多行每重复订单。i、 o1,o2和o2,o1,如果没有多个重复的订单,情况也不会那么糟糕。在这些情况下,我得到o1,o2,o1,o3,o2,o1,o2,o3,o3,o1,o3,o2等,我得到所有的排列


有人有什么见解吗?我不一定要在这里寻找性能最好的答案,只需要一个有效的答案。

可能是这样的:

测试数据:

质疑


使用EXISTS和相关子查询,您可以检查在过去一小时内是否有任何先前的订单。

以下查询确定了彼此相隔60分钟的订单的所有可能排列:

DECLARE @orders TABLE (CustomerId INT, OrderId INT, OrderDate DATETIME)

INSERT INTO @orders
VALUES
    (1, 1, '2012-03-01 01:58:00'),
    (1, 2, '2012-03-01 02:01:00'),
    (1, 3, '2012-03-01 02:03:00'),
    (2, 4, '2012-03-01 02:15:00'),
    (3, 5, '2012-03-01 02:18:00'),
    (3, 6, '2012-03-01 04:30:00'),
    (4, 7, '2012-03-01 04:35:00'),
    (5, 8, '2012-03-01 04:38:00'),
    (6, 9, '2012-03-01 04:58:00'),
    (6, 10, '2012-03-01 04:59:00');

with ProximityOrderCascade(CustomerId, OrderId, ProximateOrderId, MinutesDifference, OrderDate, ProximateOrderDate)
as 
(
    select o.customerid, o.orderid, null, null, o.orderdate, o.orderdate
    from @orders o
    union all   
    select o.customerid, o.orderid, p.orderid, datediff(minute, p.OrderDate, o.OrderDate), o.OrderDate, p.OrderDate
    from ProximityOrderCascade p
    inner join @orders o 
        on p.customerid = o.customerid 
        and abs(datediff(minute, p.OrderDate, o.OrderDate)) between 0 and 60 
        and o.orderid <> p.orderid
    where proximateorderid is null
)
select * from ProximityOrderCascade
where 
    not ProximateOrderId is null

你有一种级联依赖性。如果你的订单发生在'0、59、118、177、236等[全部59分钟前],你希望搜索结果是什么?@Dems有趣。我会考虑所有这些重复。不过,无论哪种情况,我都会对结果感到满意。不过,这只与第一个订单相比?那么在0,1,2,65,66,67分钟发生的订单呢?1,2,66,67的订单都是重复的,但这不是只能找到1,2吗?+1我正在根据我的真实数据进行尝试,谢谢你的回复。@Dems:我不明白你的意思。这将取客户组的最小值,然后减去当前日期时间。这也将以分钟为单位计算完整的日期时间。@Arion-如果有三个订单接近,则两个是重复的。如果一个多小时后有第四份订单,它就不是重复的。如果在第四个订单5分钟后有第五个订单,则为重复订单。但是,您的代码只比较第五阶和第一阶。这意味着它不会显示为副本。你打算怎么看第五个订单是第四个订单的复制品?@Dems-对。我没有考虑到这一点。你说得对。它只接受客户组中的第一个订单。假设没有两个订单同时发生。可以更改为而不是<和>=然后添加id;WITH CTE AS ( SELECT MIN(datediff(minute,'1990-1-1',order_date)) OVER(PARTITION BY customer) AS minDate, datediff(minute,'1990-1-1',order_date) AS DateTicks, tbl.customer FROM @tbl AS tbl ) SELECT CTE.customer, SUM(CASE WHEN (CTE.DateTicks-CTE.minDate)<60 THEN 1 ELSE 0 END) FROM CTE GROUP BY CTE.customer
SELECT
  *,
  CASE WHEN EXISTS (SELECT *
                      FROM orders AS lookup
                     WHERE customer    = orders.customer
                       AND order_date <  orders.order_date
                       AND order_date >= DATEADD(hour, -1, order_date)
                   )
       THEN 'Principle Order'
       ELSE 'Duplicate Order'
  END as Order_Status
FROM
  orders
DECLARE @orders TABLE (CustomerId INT, OrderId INT, OrderDate DATETIME)

INSERT INTO @orders
VALUES
    (1, 1, '2012-03-01 01:58:00'),
    (1, 2, '2012-03-01 02:01:00'),
    (1, 3, '2012-03-01 02:03:00'),
    (2, 4, '2012-03-01 02:15:00'),
    (3, 5, '2012-03-01 02:18:00'),
    (3, 6, '2012-03-01 04:30:00'),
    (4, 7, '2012-03-01 04:35:00'),
    (5, 8, '2012-03-01 04:38:00'),
    (6, 9, '2012-03-01 04:58:00'),
    (6, 10, '2012-03-01 04:59:00');

with ProximityOrderCascade(CustomerId, OrderId, ProximateOrderId, MinutesDifference, OrderDate, ProximateOrderDate)
as 
(
    select o.customerid, o.orderid, null, null, o.orderdate, o.orderdate
    from @orders o
    union all   
    select o.customerid, o.orderid, p.orderid, datediff(minute, p.OrderDate, o.OrderDate), o.OrderDate, p.OrderDate
    from ProximityOrderCascade p
    inner join @orders o 
        on p.customerid = o.customerid 
        and abs(datediff(minute, p.OrderDate, o.OrderDate)) between 0 and 60 
        and o.orderid <> p.orderid
    where proximateorderid is null
)
select * from ProximityOrderCascade
where 
    not ProximateOrderId is null
CustomerId  OrderId     ProximateOrderId MinutesDifference OrderDate               ProximateOrderDate
----------- ----------- ---------------- ----------------- ----------------------- -----------------------
6           9           10               -1                2012-03-01 04:58:00.000 2012-03-01 04:59:00.000
6           10          9                1                 2012-03-01 04:59:00.000 2012-03-01 04:58:00.000
1           1           3                -5                2012-03-01 01:58:00.000 2012-03-01 02:03:00.000
1           2           3                -2                2012-03-01 02:01:00.000 2012-03-01 02:03:00.000
1           1           2                -3                2012-03-01 01:58:00.000 2012-03-01 02:01:00.000
1           3           2                2                 2012-03-01 02:03:00.000 2012-03-01 02:01:00.000
1           2           1                3                 2012-03-01 02:01:00.000 2012-03-01 01:58:00.000
1           3           1                5                 2012-03-01 02:03:00.000 2012-03-01 01:58:00.000

(8 row(s) affected)