SQL按字符串的一部分分组
我的表格中有以下数据:SQL按字符串的一部分分组,sql,sql-server,Sql,Sql Server,我的表格中有以下数据: URL TIME DATE -------------------------------------- /x 11 2013-08-01 /x 11 2013-08-01 /pl/ 11 2013-08-01 /pl/ 11 2013-08-03 /pl/XXX/ 11 2013-08-01 /pl/XXX/ 11
URL TIME DATE
--------------------------------------
/x 11 2013-08-01
/x 11 2013-08-01
/pl/ 11 2013-08-01
/pl/ 11 2013-08-03
/pl/XXX/ 11 2013-08-01
/pl/XXX/ 11 2013-08-04
/pl/XXX/1 11 2013-08-01
/pl/XXX/2 11 2013-08-01
/pl/YYY/ 11 2013-08-01
/pl/YYY/1 11 2013-08-01
/pl/YYY/2 11 2013-08-04
/pl/YYY/3 11 2013-08-04
在SQL Server中是否有一种按URL分组的方法,最多可使用第三个斜杠/行?不幸的是,存在少于三条的记录。计算字符串中斜杠数的一个技巧是:
len(url) - len(replace(url,'/',''))
然后可以使用charindex三次来查找第三条斜线的位置:
select BeforeThirdSlash
, max([date])
from (
select case
when len(url) - len(replace(url,'/','')) < 3 then url
else substring(url, 1, charindex('/', url, charindex('/',
url, charindex('/', url)+1)+1)-1)
end as BeforeThirdSlash
, *
from @t
) as SubQueryAlias
group by
BeforeThirdSlash
您可以找到/的每个位置的位置并修剪到最大位置-根据数据假定第三个/是最后一个/的位置
DECLARE @tbl TABLE ( u VARCHAR(255), t INT, d DATE)
INSERT INTO @tbl (u, t, d) VALUES
('/x', 11, '2013-08-01'),
('/x', 11, '2013-08-01'),
('/pl/', 11, '2013-08-01'),
('/pl/', 11, '2013-08-03'),
('/pl/XXX/', 11, '2013-08-01'),
('/pl/XXX/', 11, '2013-08-04'),
('/pl/XXX/1', 11, '2013-08-01'),
('/pl/XXX/2', 11, '2013-08-01'),
('/pl/YYY/', 11, '2013-08-01'),
('/pl/YYY/1', 11, '2013-08-01'),
('/pl/YYY/2', 11, '2013-08-04'),
('/pl/YYY/3', 11, '2013-08-04')
;WITH split AS (
SELECT u, 1 s, CHARINDEX('/', u) p
FROM @tbl
UNION ALL
SELECT u, p + 1, CHARINDEX('/', u, p + 1)
FROM split
)
SELECT LEFT(t.u, split.i), MAX(t.t), MAX(t.d)
FROM @tbl t
JOIN (
SELECT u, MAX(p) i
FROM split
GROUP BY u
) split ON split.u = t.u
GROUP BY LEFT(t.u, split.i)
通过稍微调整cte,您可以控制发生的情况
DECLARE @n INT = 3 -- 'nth occurence'
;WITH split AS (
SELECT u, CHARINDEX('/', u) i, 1 r
FROM (
SELECT DISTINCT u
FROM @tbl
) t
WHERE CHARINDEX('/', u) > 0
UNION ALL
SELECT u, CHARINDEX('/', u, i + 1), r + 1
FROM split
WHERE r < @n
AND CHARINDEX('/', u, i + 1) > 0
)
SELECT LEFT(t.u, split.i) u, MAX(t.t) t , MAX(t.d) d
FROM @tbl t
JOIN split ON split.u = t.u
GROUP BY LEFT(t.u, split.i)
下面是一个简单的表达式,用于获取第三个“/”字符之前的子字符串:
case
when patindex('%/%/%/%', url) = 0 then url
else left(url,charindex('/',url,charindex('/',url,charindex('/',url)+1)+1))
end
patindex检查是否至少有三个斜杠;左侧提取子字符串,直到并包括第三个子字符串
有了这个表达式,编写group by很简单:
SELECT
url3, max(tm), max(dt)
FROM (
SELECT
CASE
WHEN patindex('%/%/%/%', url) = 0 THEN url
ELSE left(url,charindex('/',url,charindex('/',url,charindex('/',url)+1)+1))
END AS url3
, tm
, dt
FROM test
) x
GROUP BY url3
.您应该阅读这个问题:这是一个、两个还是三个斜杠:pl/XXX?我这样问是因为它可能与/pl/XXX/@TimSchmelter的路径相同,所有数据都以斜杠开头,所以这不是问题