如何在具有唯一数据的SQL列中查找重复值?

如何在具有唯一数据的SQL列中查找重复值?,sql,sql-server,Sql,Sql Server,问题: 需要查找用户复制数据时的所有实例。用户每次单击按钮时,都会创建一个相同数据的唯一批次。我需要为包含最终用户复制的批的所有实例的组提供一个结果集 样本数据: 在Microsoft SQL Server上使用Microsoft SQL 日期类型:批处理int ,日期 ,参考整数 ,来自瓦查尔州2号 ,致乌州瓦查尔2号 ,第int项 ,数量整数 预期结果: 我需要显示的所有信息来解决问题。我可以通过从、到、项目和数量进行计数以找到重复记录,但我不知道如何将其与批次和参考号联系起来 ------

问题: 需要查找用户复制数据时的所有实例。用户每次单击按钮时,都会创建一个相同数据的唯一批次。我需要为包含最终用户复制的批的所有实例的组提供一个结果集

样本数据: 在Microsoft SQL Server上使用Microsoft SQL

日期类型:批处理int ,日期 ,参考整数 ,来自瓦查尔州2号 ,致乌州瓦查尔2号 ,第int项 ,数量整数

预期结果: 我需要显示的所有信息来解决问题。我可以通过从、到、项目和数量进行计数以找到重复记录,但我不知道如何将其与批次和参考号联系起来

---------------------------------------------------------------------------------
| batch   | date        | reference | from_state | to_state | item        | qty |
---------------------------------------------------------------------------------
| 1234567 | 2016-03-01  | 8213      |  MT        | CA       | 11122334455 | 2   |
---------------------------------------------------------------------------------
| 1234567 | 2016-03-01  | 8213      |  MT        | CA       | 66622334455 | 1   |
---------------------------------------------------------------------------------
| 1234567 | 2016-03-01  | 8213      |  MT        | CA       | 77722334455 | 5   |
---------------------------------------------------------------------------------
| 1239764 | 2016-03-01  | 8597      |  MT        | CA       | 11122334455 | 2   |
---------------------------------------------------------------------------------
| 1239764 | 2016-03-01  | 8597      |  MT        | CA       | 66622334455 | 1   |
---------------------------------------------------------------------------------
| 1239764 | 2016-03-01  | 8597      |  MT        | CA       | 77722334455 | 5   |
尝试的代码:


您似乎在尝试的查询中使用基于临时表引用的SQL Server,因此我将使用它运行

这将处理单个副本。我必须更多地了解这些数据,以论证它的可靠性。对于手动验证的东西来说,它可能已经足够好了。我看看能不能想出一份以上的

with T as (
    select
        batch,
        min("date") as dt,
        min(reference) as reference,
        min(from_state) as from_state,
        min(to_state) as to_state,
        min(item) as item_min, max(item) as item_max, sum(item) as item_sum,
        min(qty) as qty_min, max(qty) as qty_max, sum(qty) as qty_sum,
        count(*) as cnt
    from <yourdata>
    group by batch
)
select t1.batch
from T t1 inner join T t2
    on t2.batch > t1.batch and t2.reference <> t1.reference
        and t2.dt = t1.dt
        and t2.from_state = t1.from_state and t2.to_state = t1.to_state
        and t2.item_min = t1.item_min and t2.qty_min = t1.qty_min
        and t2.item_max = t1.item_max and t2.qty_max = t1.qty_max
        and t2.item_sum = t1.item_sum and t2.qty_sum = t1.qty_sum
        and t2.cnt = t1.cnt
group by t1.batch
原始批次是id最低的批次

也许您希望通过匹配预聚合的行来继续这种想法。从上面将其附加到CTE:

...
, matches as (
    select p.batch, p.batch2
    from
        pairs p inner join
        <yourdata> d1 on d1.batch = p.batch full outer join
        <yourdata> d2 on d2.batch = p.batch2
            and d2.dt = d1.dt
            and d2.from_state = d1.from_state and d2.to_state = d1.to_state
            and d2.item = d1.item and d2.qty = d1.qty
    group by p.batch, p.batch2
    having
            count(d1.dt) = count(*) and count(d2.dt) = count(*)
        and count(d1.from_state) = count(*) and count(d2.from_state) = count(*)
        and count(d1.to_state) = count(*) and count(d2.to_state) = count(*)
        and count(d1.item) = count(*) and count(d2.item) = count(*)
        and count(d1.qty) = count(*) and count(d2.item) = count(*)
)
select distinct
    min(batch) over (
        partition by
            dt, from_state, to_state,
            item_min, item_max, item_sum, qty_min, qty_max, qty_sum, cnt
        ) as orig_batch,
    batch2 as dup_batch
from pairs p inner join matches m on m.batch = p.batch and m.batch2 = p.batch2

我感谢所有帮助我解决这个问题的人

-将结果集放入临时表temp_baseresults

SELECT batch
,reference 
,from_state
,to_state
,item
,qty
INTO #TEMP_baseresults
FROM datasource
-查找相同的从\u状态、到\u状态、项目和数量的所有副本

SELECT from_state
,to_state
,item
,qty
,count(*) as 'count'
INTO #TEMP_batchduplicates
FROM #TEMP_baseresults
GROUP BY from_state
,to_state
,item
,qty
HAVING COUNT(*) > 1
ORDER BY from_state
,to_state
,item
,qty
-在基表上联接重复表

SELECT *
FROM #TEMP_baseresults base
JOIN #TEMP_batchduplicates dup
ON dup.from_state = base.from_state
AND dup.to_state = base.to_state
AND dup.item = base.item
AND dup.qty = base.qty
ORDER BY base.from_state
,base.to_state
,base.item
结果显示:

-----------------------------------------------------------------------------------------
| batch   | date        | reference | from_state | to_state | item        | qty | count |
-----------------------------------------------------------------------------------------
| 1234567 | 2016-03-01  | 8213      |  MT        | CA       | 11122334455 | 2   | 2     |
-----------------------------------------------------------------------------------------
| 1234567 | 2016-03-01  | 8213      |  MT        | CA       | 66622334455 | 1   | 2     |
-----------------------------------------------------------------------------------------
| 1234567 | 2016-03-01  | 8213      |  MT        | CA       | 77722334455 | 5   | 2     |
----------------------------------------------------------------------------------------
| 1239764 | 2016-03-01  | 8597      |  MT        | CA       | 11122334455 | 2   | 2     |
----------------------------------------------------------------------------------------
| 1239764 | 2016-03-01  | 8597      |  MT        | CA       | 66622334455 | 1   | 2     |
-----------------------------------------------------------------------------------------
| 1239764 | 2016-03-01  | 8597      |  MT        | CA       | 77722334455 | 5   | 2     |

这过滤了我的数据集,仅显示已识别的重复记录,并另外标记了数据可能重复的次数。

您的rdbms SqlServer Postgres是什么?你期望的结果是什么?你说我需要修复每个批次,但不要说修复是什么意思我已经编辑了问题词你正在使用哪个DBMS?你仍然需要告诉我们你想要的结果是什么,这样我们就不必猜测了。请阅读,这里是一个学习如何提高问题质量和获得更好答案的好地方。@CoffeeCoder您是在尝试识别多批次输入的项目,还是在单个批次中识别一个项目的多个条目?乍一看,您的解决方案似乎只考虑单个行。你的问题给我的印象是,你需要对所有批次进行比较,以确保你有一个完整且相同的副本。将来,在这个问题上尽可能多地向我们提供信息,这将大大有助于我们大家。
SELECT from_state
,to_state
,item
,qty
,count(*) as 'count'
INTO #TEMP_batchduplicates
FROM #TEMP_baseresults
GROUP BY from_state
,to_state
,item
,qty
HAVING COUNT(*) > 1
ORDER BY from_state
,to_state
,item
,qty
SELECT *
FROM #TEMP_baseresults base
JOIN #TEMP_batchduplicates dup
ON dup.from_state = base.from_state
AND dup.to_state = base.to_state
AND dup.item = base.item
AND dup.qty = base.qty
ORDER BY base.from_state
,base.to_state
,base.item
-----------------------------------------------------------------------------------------
| batch   | date        | reference | from_state | to_state | item        | qty | count |
-----------------------------------------------------------------------------------------
| 1234567 | 2016-03-01  | 8213      |  MT        | CA       | 11122334455 | 2   | 2     |
-----------------------------------------------------------------------------------------
| 1234567 | 2016-03-01  | 8213      |  MT        | CA       | 66622334455 | 1   | 2     |
-----------------------------------------------------------------------------------------
| 1234567 | 2016-03-01  | 8213      |  MT        | CA       | 77722334455 | 5   | 2     |
----------------------------------------------------------------------------------------
| 1239764 | 2016-03-01  | 8597      |  MT        | CA       | 11122334455 | 2   | 2     |
----------------------------------------------------------------------------------------
| 1239764 | 2016-03-01  | 8597      |  MT        | CA       | 66622334455 | 1   | 2     |
-----------------------------------------------------------------------------------------
| 1239764 | 2016-03-01  | 8597      |  MT        | CA       | 77722334455 | 5   | 2     |