Powershell 如何合并两个csv文件中的所有内容,其中记录匹配1列

Powershell 如何合并两个csv文件中的所有内容,其中记录匹配1列,powershell,powershell-3.0,Powershell,Powershell 3.0,我有两个csv文件。它们都有SamAccountName的共同点。对于两个文件之间的每个记录,用户记录可能有匹配项,也可能没有匹配项(注意这一点非常重要) 我试图将所有列(及其值)合并到一个文件中(基于第一个文件中的SamAccountNames…) 如果在第二个文件中找不到SamAccountName,则应将该用户记录的所有空值添加到合并文件中(因为该记录是在第一个文件中找到的) 如果在第二个文件中找到SamAccountName,但在第一个文件中找不到,则应忽略合并该记录 每个文件中的列数可

我有两个csv文件。它们都有
SamAccountName
的共同点。对于两个文件之间的每个记录,用户记录可能有匹配项,也可能没有匹配项(注意这一点非常重要)

我试图将所有列(及其值)合并到一个文件中(基于第一个文件中的SamAccountNames…)

如果在第二个文件中找不到SamAccountName,则应将该用户记录的所有空值添加到合并文件中(因为该记录是在第一个文件中找到的)

如果在第二个文件中找到SamAccountName,但在第一个文件中找不到,则应忽略合并该记录

每个文件中的列数可能不同(5、10、2等…)

问题是,文件中的记录是以哈希表的形式写入的,而不是像字符串一样的行(如果您将其视为.txt)。所以我真的不知道该怎么做

正在添加结果csv示例文件。。。 第一个CSV文件

"SamAccountName","sn","GivenName"
"PBrain","Pinky","Brain"
"JSteward","John","Steward"
"JDoe","John","Doe"
"SDoo","Scooby","Doo"
第二个CSV文件

"SamAccountName","employeeNumber","userAccountControl","mail"
"KYasunori","678213","546","KYasunori@mystuff.com"
"JSteward","43518790","512","JSteward@mystuff.com"
"JKibogabi","24356","546","JKibogabi@mystuff.com"
"JDoe","902187u4","1114624","JDoe@mystuff.com"
"CStrife","54627","512","CStrife@mystuff.com"
"SamAccountName","sn","GivenName","employeeNumber","userAccountControl","mail"
"PBrain","Pinky","Brain","","",""
"JSteward","John","Steward","43518790","512","JSteward@mystuff.com"
"JDoe","John","Doe","902187u4","1114624","JDoe@mystuff.com"
"SDoo","Scooby","Doo","","",""
需要合并的CSV文件

"SamAccountName","employeeNumber","userAccountControl","mail"
"KYasunori","678213","546","KYasunori@mystuff.com"
"JSteward","43518790","512","JSteward@mystuff.com"
"JKibogabi","24356","546","JKibogabi@mystuff.com"
"JDoe","902187u4","1114624","JDoe@mystuff.com"
"CStrife","54627","512","CStrife@mystuff.com"
"SamAccountName","sn","GivenName","employeeNumber","userAccountControl","mail"
"PBrain","Pinky","Brain","","",""
"JSteward","John","Steward","43518790","512","JSteward@mystuff.com"
"JDoe","John","Doe","902187u4","1114624","JDoe@mystuff.com"
"SDoo","Scooby","Doo","","",""
注意:这将是合并多个文件的循环过程的一部分,因此我希望避免硬编码标题名称(例外情况是
$\u.SamAccountName

尝试“不安1987”的建议(不起作用)
$baseFileCsvContents=Import Csv'D:\Scripts\Powershell\Tests\base.Csv'
$FILETOBEMERGEDCSCONTENTS=导入Csv'D:\Scripts\Powershell\Tests\lookup.Csv'
$resultsFile='D:\Scripts\Powershell\Tests\MergedResults.csv'
$resultsFileContents=@()
$baseFileContents=获取内容“D:\Scripts\Powershell\Tests\base.csv”
$recordsMatched=比较对象$baseFileCsvContents$FileToEmergedCsvContents-属性SamAccountName
交换机($recordsMatched)
{
'' {}
'=='{$resultsFileContents+=$}
}
$resultsFileCsv=$resultsFileContents |转换为Csv
$resultsFileCsv |导出Csv$resultsFile-NoTypeInformation-Force

输出提供了一个空白文件:(

您可以使用
比较对象
。使用
-property-samaccountname
。例如:

$a = 1,2,3,4,5
$b = 4,5,6,7
$side = compare-object $a $b
switch ($side){
'<=' {is not in $a}
'=>' {is not in $b}
'==' { is on both sides}
}
$a=1,2,3,4,5
$b=4,5,6,7
$side=比较对象$a$b
交换机($侧){
“{不在$b}
'='{在两边}
}

当你的输出变量中有了所有的数据后,在
转换为csv
中运行它,并在一整天后将其写入一个文件中,我终于想出了一个有效的方法

...
编辑 原因:当合并具有数千条记录的文件时,中断内部循环并从数组中删除找到的元素会快得多

Function GetTitlesFromFileToBeMerged
{
    Param ($csvFile)

    [String]$fileToBeMergedTitles = Get-Content $fileToBeMerged -TotalCount 1

    [String[]]$fileToBeMergedTitles = ($fileToBeMergedTitles -replace "`",`"", "|").Trim()
    [String[]]$fileToBeMergedTitles = ($fileToBeMergedTitles -replace "`"", "").Trim()
    [String[]]$fileToBeMergedTitles = ($fileToBeMergedTitles -replace "SamAccountName", "").Trim()

    [String[]]$listOfColumnTitles = $fileToBeMergedTitles.Split('|',[System.StringSplitOptions]::RemoveEmptyEntries)

    Write-Output $listOfColumnTitles
}

$baseFile = 'D:\Scripts\Powershell\Tests\base.csv'
$fileToBeMerged = 'D:\Scripts\Powershell\Tests\lookup.csv'
$baseFileCsvContents = Import-Csv $baseFile
$baseFileContents = Get-Content $baseFile
$fileToBeMergedCsvContents = Import-Csv $fileToBeMerged
[System.Collections.Generic.List[System.Object]]$fileToBeMergedContents = Get-Content $fileToBeMerged
$resultsFile = 'D:\Scripts\Powershell\Tests\MergedResults.csv'
$resultsFileContents = @()

[String]$baseFileTitles = $baseFileContents[0]
[String]$fileToBeMergedTitles = (Get-Content $fileToBeMerged -TotalCount 1) -replace "`"SamAccountName`",", ""
$resultsFileContents += $baseFileTitles + "," + $fileToBeMergedTitles

[String]$lineMatchNotFound = ""
$arrayFileToBeMergedTitles = GetTitlesFromFileToBeMerged $fileToBeMerged
For ($valueNum = 0; $valueNum -lt $arrayFileToBeMergedTitles.Length; $valueNum++)
{
    $lineMatchNotFound += ",`"`""
}

$baseLineCounter = 1
$baseFileCsvContents | ForEach-Object {
    $baseSameAccountName = $_.SamAccountName
    [String]$baseLineInFile = $baseFileContents[$baseLineCounter]

    $lineMatchCounter = 1
    $lineMatchFound = ""
    :inner
    ForEach ($line in $fileToBeMergedContents) {
        If ($line -like "*$baseSameAccountName*") {
            [String]$lineMatchFound = "," + ($line -replace '^"[^"]*",', "")
            $fileToBeMergedContents.RemoveAt($lineMatchCounter)
            break inner
        }; $lineMatchCounter++
    }

    If (!($lineMatchFound))
    {
        [String]$lineMatchFound = $lineMatchNotFound
    }

    $mergedLine = $baseLineInFile + $lineMatchFound
    $resultsFileContents += $mergedLine
    $baseLineCounter++
}

ForEach ($line in $resultsFileContents)
{
    Write-Host $line
}

$resultsFileContents | Set-Content $resultsFile -Force

我非常确定这不是最好的方法,有更好的方法可以更快地处理。如果有人有任何想法,我愿意接受。谢谢。

下面的代码根据您提供的输入输出所需的结果

function CombineSkip1($s1, $s2){
    $s3 = $s1 -split ',' 
    $s2 -split ',' | select -Skip 1 | % {$s3 += $_}
    $s4 = $s3 -join ', '

    $s4
}

Write-Output "------Combine files------"

# content
$c1 = Get-Content D:\junk\test1.csv
$c2 = Get-Content D:\junk\test2.csv

# users in both files, could be a better way to do this
$t1 = $c1 | ConvertFrom-Csv
$t2 = $c2 | ConvertFrom-Csv
$users = $t1 | Select SamAccountName

# generate final, combined output
$combined = @()
$combined += CombineSkip1 $c1[0] $c2[0]

$c2PropCount = ($c2[0] -split ',').Count - 1
$filler = (', ""' * $c2PropCount)

for ($i = 1; $i -lt $c1.Count; $i++){
    $user = $c1[$i].Split(',')[0]
    $u2 = $c2 | where {([string]$_).StartsWith($user)}
    if ($u2)
    {
        $combined += CombineSkip1 $c1[$i] $u2
    }
    else
    {
        $combined += ($c1[$i] + $filler)
    }
}

# write to output and file
Write-Output $combined
$combined | Set-Content -Path D:\junk\test3.csv -Force

输出提供了一个空白文件(更新了我的代码以供审查)。唯一缺少的是列标题的串联(第一行为$null)。但是我可以使用我的代码片段(如我的答案中所示)来实现这一点。性能似乎很好。我明天开始工作时将测试这两个算法(在超过300000条记录的较大规模文件上)并让您知道时间戳。这基本上每周运行一次。看起来不错。还有一件事(这是我的错,我只是更新了问题中的预期输出),如果记录存在于第一个文件中,则需要添加记录,如果记录不存在于第二个文件中,则这些值需要为null。我在结果文件中注意到的另一件事是,由于某种原因,当我在excel中打开该文件时,它在一个单元格中打开了整行。如果我在记事本中打开它,它看起来几乎一样(除了在文件1中找到的、文件2列为空值的记录)和我的文件一样(只有我的文件在excel中打开,其值位于各自的单元格中)在我的回答中更新了您的代码,如果您编辑您的代码以获得相同的结果,我将编辑我的记录。然后,我们只需要解决最后一个问题,即记录填充在1个单元格中(以excel格式打开)。再次感谢您为此花费时间。解决了填充到1单元格中的记录问题。只需更新您的代码以匹配我答案中的代码片段,我将为您提供+1并检查答案:)更新代码以使用第一个文件中的所有行,并在第二个文件中匹配行。尽情享受吧。