将以句点分隔的节点拆分为JSON对象

将以句点分隔的节点拆分为JSON对象,json,powershell,Json,Powershell,我有许多字符串条目(这是名称空间/类树),如下所示: appsystem appsystem.applications appsystem.applications.APPactivities appsystem.applications.APPmanager appsystem.applications.APPmodels appsystem.applications.MAPmanager appsystem.applications.MAPmanager.maphub appsystem.a

我有许多字符串条目(这是名称空间/类树),如下所示:

appsystem
appsystem.applications
appsystem.applications.APPactivities
appsystem.applications.APPmanager
appsystem.applications.APPmodels
appsystem.applications.MAPmanager
appsystem.applications.MAPmanager.maphub
appsystem.applications.MAPmanager.mapmanager
appsystem.applications.pagealertsmanager
appsystem.authentication
appsystem.authentication.manager
appsystem.authentication.manager.encryptionmanager
appsystem.authentication.manager.sso
appsystem.authentication.manager.tokenmanager
但是,我需要最终输出如下:

{
    "name": "appsystem",
    "children": [
        {
        "name": "applications",
        "children": [
            {"name": "APPactivities"},
            {"name": "APPmanager"},
            {"name": "APPmodels"},
            {"name": "MAPmanager",
                "children": [
                    {"name": "maphub"},
                    {"name": "mapmanager"}

                ]},
            {"name": "pagealertsmanager"}   
            ]
        },
        {
        "name": "authentication",
        "children": [
            {"name": "manager",
                "children": [
                    {"name": "encryptionmanager"},
                    {"name": "sso"},
                    {"name": "tokenmanager"}
                ]}
            ]
        }
    ]
}
总节点数可以是任意数量


我假设我需要递归,但我甚至不知道从哪里开始。

这会建立嵌套列表,PowerShell ConvertTo JSON会将外部列表展平

您可以将$s中的
$Line
更改为
$Line in(Get Content input.txt)

但我认为这样做:

$s = @'
appsystem
appsystem.applications
appsystem.applications.APPactivities
appsystem.applications.APPmanager
appsystem.applications.APPmodels
appsystem.applications.MAPmanager
appsystem.applications.MAPmanager.maphub
appsystem.applications.MAPmanager.mapmanager
appsystem.applications.pagealertsmanager
appsystem.authentication
appsystem.authentication.manager
appsystem.authentication.manager.encryptionmanager
appsystem.authentication.manager.sso
appsystem.authentication.manager.tokenmanager
'@ -split "`r`n"

$TreeRoot = New-Object System.Collections.ArrayList

foreach ($Line in $s) {

    $CurrentDepth = $TreeRoot

    $RemainingChunks = $Line.Split('.')
    while ($RemainingChunks)
    {

        # If there is a dictionary at this depth then use it, otherwise create one.
        $Item = $CurrentDepth | Where-Object {$_.name -eq $RemainingChunks[0]}
        if (-not $Item)
        {
            $Item = @{name=$RemainingChunks[0]}
            $null = $CurrentDepth.Add($Item)
        }

        # If there will be child nodes, look for a 'children' node, or create one.
        if ($RemainingChunks.Count -gt 1)
        {
            if (-not $Item.ContainsKey('children'))
            {
                $Item['children'] = New-Object System.Collections.ArrayList
            }

            $CurrentDepth = $Item['children']
        }

        $RemainingChunks = $RemainingChunks[1..$RemainingChunks.Count]
    }
}

$TreeRoot | ConvertTo-Json -Depth 1000

编辑:太慢了?我尝试了一些分析,并发现(不太奇怪)是内部嵌套循环,它搜索
子节点
数组以查找匹配的子节点,被命中的次数太多了

这是一个重新设计的版本,它仍在构建树,这次它还在树中构建了一个快捷方式树映射哈希表,指向所有以前构建的节点,因此它可以直接跳转到这些节点,而不是搜索子节点的
列表

我制作了一个测试文件,大约20k条随机线。原始代码用108秒处理,这个代码用1.5秒处理,输出匹配

$TreeRoot = New-Object System.Collections.ArrayList
$TreeMap = @{}

foreach ($line in (Get-Content d:\out.txt)) {

    $_ = ".$line"    # easier if the lines start with a dot

    if ($TreeMap.ContainsKey($_))    # Skip duplicate lines
    { 
        continue
    }

    # build a subtree from the right. a.b.c.d.e -> e  then  d->e  then  c->d->e
    # keep going until base 'a.b' reduces to something already in the tree, connect new bit to that.
    $LineSubTree = $null
    $TreeConnectionPoint = $null

    do {
        $lastDotPos = $_.LastIndexOf('.')
        $leaf = $_.Substring($lastDotPos + 1)
        $_ = $_.Substring(0, $lastDotPos)

        # push the leaf on top of the growing subtree
        $LineSubTree = if ($LineSubTree) {
                           @{"name"=$leaf; "children"=([System.Collections.ArrayList]@($LineSubTree))}
                       } else { 
                           @{"name"=$leaf}
                       }

        $TreeMap["$_.$leaf"] = $LineSubTree

    } while (!($TreeConnectionPoint = $TreeMap[$_]) -and $_)


    # Now we have a branch built to connect in to the existing tree
    # but is there somewhere to put it?
    if ($TreeConnectionPoint)
    {
        if ($TreeConnectionPoint.ContainsKey('children'))
        {
            $null = $TreeConnectionPoint['children'].Add($LineSubTree)
        } else {
            $TreeConnectionPoint['children'] = [System.Collections.ArrayList]@($LineSubTree)
        }
    } else
    {           # nowhere to put it, this is a new root level connection
        $null = $TreeRoot.Add($LineSubTree)
    }
}

$TreeRoot | ConvertTo-Json -Depth 100
(@mklement0的代码需要103秒,并产生一个完全不同的输出-5.4M字符的JSON,而不是10.1M字符的JSON。[编辑:因为我的代码允许我的测试文件的列表中有多个根节点,而他们的代码不允许这样做])


从我的代码块自动生成PS帮助链接(如果可用):

  • (在模块
    中的Microsoft.PowerShell.Utility
  • (在模块
    Microsoft.PowerShell.Management
    中)
  • (在模块
    中的Microsoft.PowerShell.Utility

这将建立嵌套列表,PowerShell ConvertTo JSON将扁平化外部列表

您可以将$s中的
$Line
更改为
$Line in(Get Content input.txt)

但我认为这样做:

$s = @'
appsystem
appsystem.applications
appsystem.applications.APPactivities
appsystem.applications.APPmanager
appsystem.applications.APPmodels
appsystem.applications.MAPmanager
appsystem.applications.MAPmanager.maphub
appsystem.applications.MAPmanager.mapmanager
appsystem.applications.pagealertsmanager
appsystem.authentication
appsystem.authentication.manager
appsystem.authentication.manager.encryptionmanager
appsystem.authentication.manager.sso
appsystem.authentication.manager.tokenmanager
'@ -split "`r`n"

$TreeRoot = New-Object System.Collections.ArrayList

foreach ($Line in $s) {

    $CurrentDepth = $TreeRoot

    $RemainingChunks = $Line.Split('.')
    while ($RemainingChunks)
    {

        # If there is a dictionary at this depth then use it, otherwise create one.
        $Item = $CurrentDepth | Where-Object {$_.name -eq $RemainingChunks[0]}
        if (-not $Item)
        {
            $Item = @{name=$RemainingChunks[0]}
            $null = $CurrentDepth.Add($Item)
        }

        # If there will be child nodes, look for a 'children' node, or create one.
        if ($RemainingChunks.Count -gt 1)
        {
            if (-not $Item.ContainsKey('children'))
            {
                $Item['children'] = New-Object System.Collections.ArrayList
            }

            $CurrentDepth = $Item['children']
        }

        $RemainingChunks = $RemainingChunks[1..$RemainingChunks.Count]
    }
}

$TreeRoot | ConvertTo-Json -Depth 1000

编辑:太慢了?我尝试了一些分析,并发现(不太奇怪)是内部嵌套循环,它搜索
子节点
数组以查找匹配的子节点,被命中的次数太多了

这是一个重新设计的版本,它仍在构建树,这次它还在树中构建了一个快捷方式树映射哈希表,指向所有以前构建的节点,因此它可以直接跳转到这些节点,而不是搜索子节点的
列表

我制作了一个测试文件,大约20k条随机线。原始代码用108秒处理,这个代码用1.5秒处理,输出匹配

$TreeRoot = New-Object System.Collections.ArrayList
$TreeMap = @{}

foreach ($line in (Get-Content d:\out.txt)) {

    $_ = ".$line"    # easier if the lines start with a dot

    if ($TreeMap.ContainsKey($_))    # Skip duplicate lines
    { 
        continue
    }

    # build a subtree from the right. a.b.c.d.e -> e  then  d->e  then  c->d->e
    # keep going until base 'a.b' reduces to something already in the tree, connect new bit to that.
    $LineSubTree = $null
    $TreeConnectionPoint = $null

    do {
        $lastDotPos = $_.LastIndexOf('.')
        $leaf = $_.Substring($lastDotPos + 1)
        $_ = $_.Substring(0, $lastDotPos)

        # push the leaf on top of the growing subtree
        $LineSubTree = if ($LineSubTree) {
                           @{"name"=$leaf; "children"=([System.Collections.ArrayList]@($LineSubTree))}
                       } else { 
                           @{"name"=$leaf}
                       }

        $TreeMap["$_.$leaf"] = $LineSubTree

    } while (!($TreeConnectionPoint = $TreeMap[$_]) -and $_)


    # Now we have a branch built to connect in to the existing tree
    # but is there somewhere to put it?
    if ($TreeConnectionPoint)
    {
        if ($TreeConnectionPoint.ContainsKey('children'))
        {
            $null = $TreeConnectionPoint['children'].Add($LineSubTree)
        } else {
            $TreeConnectionPoint['children'] = [System.Collections.ArrayList]@($LineSubTree)
        }
    } else
    {           # nowhere to put it, this is a new root level connection
        $null = $TreeRoot.Add($LineSubTree)
    }
}

$TreeRoot | ConvertTo-Json -Depth 100
(@mklement0的代码需要103秒,并产生一个完全不同的输出-5.4M字符的JSON,而不是10.1M字符的JSON。[编辑:因为我的代码允许我的测试文件的列表中有多个根节点,而他们的代码不允许这样做])


从我的代码块自动生成PS帮助链接(如果可用):

  • (在模块
    中的Microsoft.PowerShell.Utility
  • (在模块
    Microsoft.PowerShell.Management
    中)
  • (在模块
    中的Microsoft.PowerShell.Utility
以使用递归函数的替代实现作为补充

重点是模块化和简洁性,而不是性能。[1]


[1] 应用了一种经过深思熟虑的优化类型,但仍然保持代码简洁:

PSv4+提供了(鲜为人知的)数组方法,这些方法不仅明显快于其cmdlet对应的
ForEach Object
Where Object
,而且还提供了其他功能

具体而言:

  • $path.ForEach({…})
    用于代替
    $path | ForEach对象{…}

  • $ht.children.Where({$\.name-eq$childName},'First')[0]
    用于代替
    $ht.children |其中对象{$\.name-eq$childName}|选择对象-第一个1

以使用递归函数的替代实现作为补充

重点是模块化和简洁性,而不是性能。[1]


[1] 应用了一种经过深思熟虑的优化类型,但仍然保持代码简洁:

PSv4+提供了(鲜为人知的)数组方法,这些方法不仅明显快于其cmdlet对应的
ForEach Object
Where Object
,而且还提供了其他功能

具体而言:

  • $path.ForEach({…})
    用于代替
    $path | ForEach对象{…}

  • $ht.children.Where({$\.name-eq$childName},'First')[0]
    用于代替
    $ht.children |其中对象{$\.name-eq$childName}|选择对象-第一个1


我回答了,但我还是投了反对票,因为你刚刚说“有个问题,我需要这个”。@TessellingEckler,是的,我知道。我找不到类似的例子-大多数JSON对象示例都是将JSON项读入PS。我已经不知所措了。我在这里看到了一个非常类似的问题,一个我试图回答但当时无法解决的问题-其他人回答了,这可能是一个有用的参考-但我现在找不到它,“powershell”和“json”的结果太多了。我回答了,但我还是投了反对票,因为你刚刚说“有个问题,我需要这个”。@TessellatingHeckler,是的,我知道。我找不到类似的例子-大多数JSON对象示例都是将JSON项读入PS。我已经不知所措了。我在这里看到了一个非常类似的问题,一个我试图回答但当时无法解决的问题-其他人回答了,这可能是一个有用的参考-但我现在找不到它,“powershell”和“json”的结果太多了。好吧!这很酷。我有很多东西要读