在Microsoft SQL数据库中对文件路径进行分组
我有一个包含文件夹列表的表,如下所示:在Microsoft SQL数据库中对文件路径进行分组,sql,sql-server,regex,tsql,select,Sql,Sql Server,Regex,Tsql,Select,我有一个包含文件夹列表的表,如下所示: Path Size C:\ParentFolder\A 123 C:\ParentFolder\A\B 442434 C:\ParentFolder\A\B\C 13413412 C:\ParentFolder\D 2422341234 C:\ParentFolder\D\E 3342 C:\ParentFolder\D\E\F 2 C:\ParentFolder\D\E
Path Size
C:\ParentFolder\A 123
C:\ParentFolder\A\B 442434
C:\ParentFolder\A\B\C 13413412
C:\ParentFolder\D 2422341234
C:\ParentFolder\D\E 3342
C:\ParentFolder\D\E\F 2
C:\ParentFolder\D\E\G 2
...
我正在寻找SUM、GROUP BY和PATINDEX/LTRIM/SUBSTRING/等的一些组合,这些组合将返回以下信息:
Path SumSize
C:\ParentFolder\A 13855969
C:\ParentFolder\D 2422344580
...
C:\ParentFolder是一个已知前缀,但a、D等是可变文件夹名称。我是否需要编写一个函数来实现这一点,或者我可以使用一些字符串函数的组合
select r.Path, sum(Size) as SumSize
from MyTable m
inner join (
select Path
from MyTable
where charindex('\', Path, len('C:\ParentFolder\') + 1) = 0
) r on charindex(r.Path, m.Path, 0) = 1
group by r.Path
从测试集开始
CREATE TABLE #MyTable (Folder varchar(100) not null, Size bigint not null)
INSERT #MyTable values
('C:\ParentFolder\A' , 123)
,('C:\ParentFolder\A\B' , 442434)
,('C:\ParentFolder\A\B\C' , 13413412)
,('C:\ParentFolder\D' , 2422341234)
,('C:\ParentFolder\D\E' , 3342)
,('C:\ParentFolder\D\E\F' , 2)
,('C:\ParentFolder\D\E\G' , 2)
首先确定要汇总的文件夹。我在这里是通过将它们加载到临时表中来实现的:
DECLARE @Targets table (Folder varchar(100) not null)
INSERT @Targets values
('C:\ParentFolder\A')
,('C:\ParentFolder\D')
从这里很容易,使用like子句:
SELECT ta.Folder, sum(Size) TotalSize
from @Targets ta
left outer join #MyTable mt
on mt.Folder like ta.Folder + '%'
group by ta.Folder
如果您的文件夹包含由
like
子句使用的保留字符:%\[
和其他一些字符,则可能会出现问题。假设始终存在最高级别目录的条目(即,如果存在c:\xxx\yyy\zzz
则始终存在c:\xxx\yyy
如何
;with roots (root) as (
select distinct
path + '\'
from
thetable
where
--only include paths with 2 x \
len(path) - 2 = len(replace(path, '\', ''))
)
select
roots.root,
sum(thetable.size)
from
roots
inner join
thetable on left(thetable.path + '\', len(roots.root)) = roots.root
group by
roots.root
--如果文件夹名称始终为一个字符
select
LEFT(folder,CHARINDEX('r\',folder)+2) as folder_group
,SUM(size) as sumsize
from #mytable
GROUP BY
LEFT(folder,CHARINDEX('r\',folder)+2)
--如果文件夹名称的长度可变
select
CASE WHEN CHARINDEX('\',folder,CHARINDEX('\',folder,CHARINDEX('\',folder)+1)+1) = 0 THEN folder
ELSE LEFT(folder,CHARINDEX('\',folder,CHARINDEX('\',folder,CHARINDEX('\',folder)+1)+1) -1) END AS folder_group
,SUM(size) as sumsize
from #mytable
GROUP BY
CASE WHEN CHARINDEX('\',folder,CHARINDEX('\',folder,CHARINDEX('\',folder)+1)+1) = 0 THEN folder
ELSE LEFT(folder,CHARINDEX('\',folder,CHARINDEX('\',folder,CHARINDEX('\',folder)+1)+1) -1) END
您使用的sql版本是什么?一级以下?是的,我想总结A和A.D下的所有内容,以及D下的所有内容,而不是其他内容。诀窍是直到运行时我才知道A和D是什么。版本?sql 2008R2我想它会给出工作结果,因为在使用类似于的
运算符时会出现重复。此解决方案是最干净的我所做的唯一调整是将@Targets value('C:\ParentFolder\A')、('C:\ParentFolder\D')插入一个动态查询中,该查询将在运行时发现文件夹。您假设直接子项名称只有一个字符长,“但A、D等是可变文件夹名称。”
select
CASE WHEN CHARINDEX('\',folder,CHARINDEX('\',folder,CHARINDEX('\',folder)+1)+1) = 0 THEN folder
ELSE LEFT(folder,CHARINDEX('\',folder,CHARINDEX('\',folder,CHARINDEX('\',folder)+1)+1) -1) END AS folder_group
,SUM(size) as sumsize
from #mytable
GROUP BY
CASE WHEN CHARINDEX('\',folder,CHARINDEX('\',folder,CHARINDEX('\',folder)+1)+1) = 0 THEN folder
ELSE LEFT(folder,CHARINDEX('\',folder,CHARINDEX('\',folder,CHARINDEX('\',folder)+1)+1) -1) END