在Microsoft SQL数据库中对文件路径进行分组

在Microsoft SQL数据库中对文件路径进行分组,sql,sql-server,regex,tsql,select,Sql,Sql Server,Regex,Tsql,Select,我有一个包含文件夹列表的表,如下所示: Path Size C:\ParentFolder\A 123 C:\ParentFolder\A\B 442434 C:\ParentFolder\A\B\C 13413412 C:\ParentFolder\D 2422341234 C:\ParentFolder\D\E 3342 C:\ParentFolder\D\E\F 2 C:\ParentFolder\D\E

我有一个包含文件夹列表的表,如下所示:

Path                    Size
C:\ParentFolder\A       123
C:\ParentFolder\A\B     442434
C:\ParentFolder\A\B\C   13413412
C:\ParentFolder\D       2422341234
C:\ParentFolder\D\E     3342
C:\ParentFolder\D\E\F   2
C:\ParentFolder\D\E\G   2
...
我正在寻找SUM、GROUP BY和PATINDEX/LTRIM/SUBSTRING/等的一些组合,这些组合将返回以下信息:

Path                    SumSize
C:\ParentFolder\A       13855969
C:\ParentFolder\D       2422344580
...
C:\ParentFolder是一个已知前缀,但a、D等是可变文件夹名称。我是否需要编写一个函数来实现这一点,或者我可以使用一些字符串函数的组合

select r.Path, sum(Size) as SumSize
from MyTable m
inner join (
    select Path
    from MyTable 
    where charindex('\', Path, len('C:\ParentFolder\') + 1) = 0 
) r on charindex(r.Path, m.Path, 0) = 1
group by r.Path

从测试集开始

CREATE TABLE #MyTable (Folder varchar(100) not null, Size bigint not null)

INSERT #MyTable values
  ('C:\ParentFolder\A'     ,  123)
 ,('C:\ParentFolder\A\B'   ,  442434)
 ,('C:\ParentFolder\A\B\C' ,  13413412)
 ,('C:\ParentFolder\D'     ,  2422341234)
 ,('C:\ParentFolder\D\E'   ,  3342)
 ,('C:\ParentFolder\D\E\F' ,  2)
 ,('C:\ParentFolder\D\E\G' ,  2)
首先确定要汇总的文件夹。我在这里是通过将它们加载到临时表中来实现的:

DECLARE @Targets table (Folder varchar(100) not null)
INSERT @Targets values
  ('C:\ParentFolder\A')
 ,('C:\ParentFolder\D')
从这里很容易,使用like子句:

SELECT ta.Folder, sum(Size) TotalSize
 from @Targets ta
  left outer join #MyTable mt
   on mt.Folder like ta.Folder + '%'
 group by ta.Folder

如果您的文件夹包含由
like
子句使用的保留字符:
%\[
和其他一些字符,则可能会出现问题。

假设始终存在最高级别目录的条目(即,如果存在
c:\xxx\yyy\zzz
则始终存在
c:\xxx\yyy
如何

;with roots (root) as (
     select distinct
        path + '\' 
     from 
        thetable
     where 
        --only include paths with 2 x \
        len(path) - 2 = len(replace(path, '\', '')) 
)
select
    roots.root,
    sum(thetable.size)
from 
    roots
inner join 
    thetable on left(thetable.path + '\', len(roots.root)) = roots.root 
group by
    roots.root
--如果文件夹名称始终为一个字符

select 
LEFT(folder,CHARINDEX('r\',folder)+2) as folder_group
,SUM(size) as sumsize
from #mytable
GROUP BY
LEFT(folder,CHARINDEX('r\',folder)+2)
--如果文件夹名称的长度可变

select 
CASE WHEN CHARINDEX('\',folder,CHARINDEX('\',folder,CHARINDEX('\',folder)+1)+1) = 0 THEN folder
    ELSE LEFT(folder,CHARINDEX('\',folder,CHARINDEX('\',folder,CHARINDEX('\',folder)+1)+1) -1) END AS folder_group
,SUM(size) as sumsize
from #mytable
GROUP BY
CASE WHEN CHARINDEX('\',folder,CHARINDEX('\',folder,CHARINDEX('\',folder)+1)+1) = 0 THEN folder
    ELSE LEFT(folder,CHARINDEX('\',folder,CHARINDEX('\',folder,CHARINDEX('\',folder)+1)+1) -1) END 

您使用的sql版本是什么?一级以下?是的,我想总结A和A.D下的所有内容,以及D下的所有内容,而不是其他内容。诀窍是直到运行时我才知道A和D是什么。版本?sql 2008R2我想它会给出工作结果,因为在使用类似于的
运算符时会出现重复。此解决方案是最干净的我所做的唯一调整是将@Targets value('C:\ParentFolder\A')、('C:\ParentFolder\D')插入一个动态查询中,该查询将在运行时发现文件夹。您假设直接子项名称只有一个字符长,“但A、D等是可变文件夹名称。”
select 
CASE WHEN CHARINDEX('\',folder,CHARINDEX('\',folder,CHARINDEX('\',folder)+1)+1) = 0 THEN folder
    ELSE LEFT(folder,CHARINDEX('\',folder,CHARINDEX('\',folder,CHARINDEX('\',folder)+1)+1) -1) END AS folder_group
,SUM(size) as sumsize
from #mytable
GROUP BY
CASE WHEN CHARINDEX('\',folder,CHARINDEX('\',folder,CHARINDEX('\',folder)+1)+1) = 0 THEN folder
    ELSE LEFT(folder,CHARINDEX('\',folder,CHARINDEX('\',folder,CHARINDEX('\',folder)+1)+1) -1) END