SQL Server:在一列中查找重复的子字符串
我在SQL Server中有一个客户机表。我试图找出EMAIL地址列中的重复,但是我只需要考虑列数据的一部分,所以子串。实际上,我需要在记录中找到重复的域名SQL Server:在一列中查找重复的子字符串,sql,sql-server,Sql,Sql Server,我在SQL Server中有一个客户机表。我试图找出EMAIL地址列中的重复,但是我只需要考虑列数据的一部分,所以子串。实际上,我需要在记录中找到重复的域名 我已经使用了下面的查询来查找精确的副本(在整个字段上),但是我如何修改它来考虑子字符串? SELECT a.email_address, b.dupeCount, a.client_id FROM tblClient a INNER JOIN ( SELECT email_address, COUNT(*) AS dupeCount
我已经使用了下面的查询来查找精确的副本(在整个字段上),但是我如何修改它来考虑子字符串?
SELECT a.email_address, b.dupeCount, a.client_id
FROM tblClient a
INNER JOIN (
SELECT email_address, COUNT(*) AS dupeCount
FROM tblClient
GROUP BY email_address
HAVING COUNT(*) > 1
) b ON a.email_address = b.email_address
非常感谢 gee:
SELECT substr(email_address, 1, 2), count(*)
FROM tblClient
group by 1
试试这个:
declare @contact table (
[client_id] [int] identity(1, 1)
, [email] [sysname]
);
insert into @contact
([email])
values (N'joe@billy_bobs.com'),
(N'sally@beauty.com'),
(N'george@billy_bobs.com');
with [stripper]
as (select [client_id]
, [email]
, substring([email]
, charindex(N'@', [email], 0) + 1
, len([email])) as [domain_name]
from @contact),
[duplicate_finder]
as (select [client_id]
, [domain_name]
, row_number()
over (
partition by [domain_name]
order by [domain_name]) as [sequence]
from [stripper])
select from [duplicate_finder]
where [sequence] > 1;
如果您已经怀疑需要使用子字符串,那么不妨尝试一下。对于您想要获取的数据,pivot可能会更好。请尝试加入电子邮件地址中匹配的子字符串。谢谢您的回复。实际上,我并不想删除重复记录,如果没有删除部分,我如何才能使其正常工作?Adam,我只更新了一条select语句以反映您的问题。您将如何修改此查询以获取与这些唯一子字符串关联的所有行?