如何在具有唯一数据的SQL列中查找重复值?
问题: 需要查找用户复制数据时的所有实例。用户每次单击按钮时,都会创建一个相同数据的唯一批次。我需要为包含最终用户复制的批的所有实例的组提供一个结果集 样本数据: 在Microsoft SQL Server上使用Microsoft SQL 日期类型:批处理int ,日期 ,参考整数 ,来自瓦查尔州2号 ,致乌州瓦查尔2号 ,第int项 ,数量整数 预期结果: 我需要显示的所有信息来解决问题。我可以通过从、到、项目和数量进行计数以找到重复记录,但我不知道如何将其与批次和参考号联系起来如何在具有唯一数据的SQL列中查找重复值?,sql,sql-server,Sql,Sql Server,问题: 需要查找用户复制数据时的所有实例。用户每次单击按钮时,都会创建一个相同数据的唯一批次。我需要为包含最终用户复制的批的所有实例的组提供一个结果集 样本数据: 在Microsoft SQL Server上使用Microsoft SQL 日期类型:批处理int ,日期 ,参考整数 ,来自瓦查尔州2号 ,致乌州瓦查尔2号 ,第int项 ,数量整数 预期结果: 我需要显示的所有信息来解决问题。我可以通过从、到、项目和数量进行计数以找到重复记录,但我不知道如何将其与批次和参考号联系起来 ------
---------------------------------------------------------------------------------
| batch | date | reference | from_state | to_state | item | qty |
---------------------------------------------------------------------------------
| 1234567 | 2016-03-01 | 8213 | MT | CA | 11122334455 | 2 |
---------------------------------------------------------------------------------
| 1234567 | 2016-03-01 | 8213 | MT | CA | 66622334455 | 1 |
---------------------------------------------------------------------------------
| 1234567 | 2016-03-01 | 8213 | MT | CA | 77722334455 | 5 |
---------------------------------------------------------------------------------
| 1239764 | 2016-03-01 | 8597 | MT | CA | 11122334455 | 2 |
---------------------------------------------------------------------------------
| 1239764 | 2016-03-01 | 8597 | MT | CA | 66622334455 | 1 |
---------------------------------------------------------------------------------
| 1239764 | 2016-03-01 | 8597 | MT | CA | 77722334455 | 5 |
尝试的代码:
您似乎在尝试的查询中使用基于临时表引用的SQL Server,因此我将使用它运行 这将处理单个副本。我必须更多地了解这些数据,以论证它的可靠性。对于手动验证的东西来说,它可能已经足够好了。我看看能不能想出一份以上的
with T as (
select
batch,
min("date") as dt,
min(reference) as reference,
min(from_state) as from_state,
min(to_state) as to_state,
min(item) as item_min, max(item) as item_max, sum(item) as item_sum,
min(qty) as qty_min, max(qty) as qty_max, sum(qty) as qty_sum,
count(*) as cnt
from <yourdata>
group by batch
)
select t1.batch
from T t1 inner join T t2
on t2.batch > t1.batch and t2.reference <> t1.reference
and t2.dt = t1.dt
and t2.from_state = t1.from_state and t2.to_state = t1.to_state
and t2.item_min = t1.item_min and t2.qty_min = t1.qty_min
and t2.item_max = t1.item_max and t2.qty_max = t1.qty_max
and t2.item_sum = t1.item_sum and t2.qty_sum = t1.qty_sum
and t2.cnt = t1.cnt
group by t1.batch
原始批次是id最低的批次
也许您希望通过匹配预聚合的行来继续这种想法。从上面将其附加到CTE:
...
, matches as (
select p.batch, p.batch2
from
pairs p inner join
<yourdata> d1 on d1.batch = p.batch full outer join
<yourdata> d2 on d2.batch = p.batch2
and d2.dt = d1.dt
and d2.from_state = d1.from_state and d2.to_state = d1.to_state
and d2.item = d1.item and d2.qty = d1.qty
group by p.batch, p.batch2
having
count(d1.dt) = count(*) and count(d2.dt) = count(*)
and count(d1.from_state) = count(*) and count(d2.from_state) = count(*)
and count(d1.to_state) = count(*) and count(d2.to_state) = count(*)
and count(d1.item) = count(*) and count(d2.item) = count(*)
and count(d1.qty) = count(*) and count(d2.item) = count(*)
)
select distinct
min(batch) over (
partition by
dt, from_state, to_state,
item_min, item_max, item_sum, qty_min, qty_max, qty_sum, cnt
) as orig_batch,
batch2 as dup_batch
from pairs p inner join matches m on m.batch = p.batch and m.batch2 = p.batch2
我感谢所有帮助我解决这个问题的人 -将结果集放入临时表temp_baseresults
SELECT batch
,reference
,from_state
,to_state
,item
,qty
INTO #TEMP_baseresults
FROM datasource
-查找相同的从\u状态、到\u状态、项目和数量的所有副本
SELECT from_state
,to_state
,item
,qty
,count(*) as 'count'
INTO #TEMP_batchduplicates
FROM #TEMP_baseresults
GROUP BY from_state
,to_state
,item
,qty
HAVING COUNT(*) > 1
ORDER BY from_state
,to_state
,item
,qty
-在基表上联接重复表
SELECT *
FROM #TEMP_baseresults base
JOIN #TEMP_batchduplicates dup
ON dup.from_state = base.from_state
AND dup.to_state = base.to_state
AND dup.item = base.item
AND dup.qty = base.qty
ORDER BY base.from_state
,base.to_state
,base.item
结果显示:
-----------------------------------------------------------------------------------------
| batch | date | reference | from_state | to_state | item | qty | count |
-----------------------------------------------------------------------------------------
| 1234567 | 2016-03-01 | 8213 | MT | CA | 11122334455 | 2 | 2 |
-----------------------------------------------------------------------------------------
| 1234567 | 2016-03-01 | 8213 | MT | CA | 66622334455 | 1 | 2 |
-----------------------------------------------------------------------------------------
| 1234567 | 2016-03-01 | 8213 | MT | CA | 77722334455 | 5 | 2 |
----------------------------------------------------------------------------------------
| 1239764 | 2016-03-01 | 8597 | MT | CA | 11122334455 | 2 | 2 |
----------------------------------------------------------------------------------------
| 1239764 | 2016-03-01 | 8597 | MT | CA | 66622334455 | 1 | 2 |
-----------------------------------------------------------------------------------------
| 1239764 | 2016-03-01 | 8597 | MT | CA | 77722334455 | 5 | 2 |
这过滤了我的数据集,仅显示已识别的重复记录,并另外标记了数据可能重复的次数。您的rdbms SqlServer Postgres是什么?你期望的结果是什么?你说我需要修复每个批次,但不要说修复是什么意思我已经编辑了问题词你正在使用哪个DBMS?你仍然需要告诉我们你想要的结果是什么,这样我们就不必猜测了。请阅读,这里是一个学习如何提高问题质量和获得更好答案的好地方。@CoffeeCoder您是在尝试识别多批次输入的项目,还是在单个批次中识别一个项目的多个条目?乍一看,您的解决方案似乎只考虑单个行。你的问题给我的印象是,你需要对所有批次进行比较,以确保你有一个完整且相同的副本。将来,在这个问题上尽可能多地向我们提供信息,这将大大有助于我们大家。
SELECT from_state
,to_state
,item
,qty
,count(*) as 'count'
INTO #TEMP_batchduplicates
FROM #TEMP_baseresults
GROUP BY from_state
,to_state
,item
,qty
HAVING COUNT(*) > 1
ORDER BY from_state
,to_state
,item
,qty
SELECT *
FROM #TEMP_baseresults base
JOIN #TEMP_batchduplicates dup
ON dup.from_state = base.from_state
AND dup.to_state = base.to_state
AND dup.item = base.item
AND dup.qty = base.qty
ORDER BY base.from_state
,base.to_state
,base.item
-----------------------------------------------------------------------------------------
| batch | date | reference | from_state | to_state | item | qty | count |
-----------------------------------------------------------------------------------------
| 1234567 | 2016-03-01 | 8213 | MT | CA | 11122334455 | 2 | 2 |
-----------------------------------------------------------------------------------------
| 1234567 | 2016-03-01 | 8213 | MT | CA | 66622334455 | 1 | 2 |
-----------------------------------------------------------------------------------------
| 1234567 | 2016-03-01 | 8213 | MT | CA | 77722334455 | 5 | 2 |
----------------------------------------------------------------------------------------
| 1239764 | 2016-03-01 | 8597 | MT | CA | 11122334455 | 2 | 2 |
----------------------------------------------------------------------------------------
| 1239764 | 2016-03-01 | 8597 | MT | CA | 66622334455 | 1 | 2 |
-----------------------------------------------------------------------------------------
| 1239764 | 2016-03-01 | 8597 | MT | CA | 77722334455 | 5 | 2 |