Powershell/Perl：将多个CSV文件合并为一个？_Perl_Powershell

Powershell/Perl：将多个CSV文件合并为一个？

perl powershell

Powershell/Perl：将多个CSV文件合并为一个？,perl,powershell,Perl,Powershell,我有以下CSV文件，我想把它们合并成一个CSV文件 01.csv apples,48,12,7 pear,17,16,2 orange,22,6,1 02.csv apples,51,8,6 grape,87,42,12 pear,22,3,7 03.csv apples,11,12,13 grape,81,5,8 pear,11,5,6 04.csv apples,14,12,8 orange,5,7,9 期望输出： apples,48,12,7,51,8,6,11,12,13,14,

我有以下CSV文件，我想把它们合并成一个CSV文件

01.csv

apples,48,12,7
pear,17,16,2
orange,22,6,1

02.csv

apples,51,8,6
grape,87,42,12
pear,22,3,7

03.csv

apples,11,12,13
grape,81,5,8
pear,11,5,6

04.csv

apples,14,12,8
orange,5,7,9

期望输出：

apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,87,42,12,81,5,8,,,
pear,17,16,2,22,3,7,11,5,6,,,
orange,22,6,1,,,,,,5,7,9

有人能就如何实现这一目标提供指导吗？最好使用Powershell，但如果更容易的话，可以使用Perl等替代方法

感谢Pantik，您的代码输出接近我想要的：

apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,87,42,12,81,5,8
orange,22,6,1,5,7,9
pear,17,16,2,22,3,7,11,5,6

不幸的是，我需要在CSV文件中不存在条目时使用“占位符”逗号，例如橙色，22,6,1，，，，，，，，，，5,7,9，而不是橙色，22,6,1,5,7,9

更新：我希望这些文件按文件名的顺序进行解析，例如：

$myFiles = @(gci *.csv) | sort Name
foreach ($file in $myFiles){

问候

ted

你必须解析文件，我看不出更简单的方法

powershell中的解决方案：

$produce = "apples","grape","orange","pear"
$produce_hash = @{}
$produce | foreach-object {$produce_hash[$_] = @(,$_)}

$myFiles = @(gci *.csv) | sort Name
 foreach ($file in $myFiles){ 
    $file_hash = @{}
    $produce | foreach-object {$file_hash[$_] = @($null,$null,$null)}
        get-content $file | foreach-object{
            $line = $_.split(",")
            $file_hash[$line[0]] = $line[1..3]
            }
    $produce | foreach-object {
        $produce_hash[$_] += $file_hash[$_]
        }
  }

$ofs = ","
$out = @()
$produce | foreach-object {
 $out += [string]$produce_hash[$_]
 }

$out | out-file "outputfile.csv" 

gc outputfile.csv

apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,,87,42,12,81,5,8,,,
orange,22,6,1,,,,,,,5,7,9
pear,17,16,2,22,3,7,11,5,6,,,

更新：好的，调整了一点-希望可以理解

$items = @{}
$colCount = 0 # total amount of columns
# loop through all files
foreach ($file in (gci *.csv | sort Name))
{
    $content = Get-Content $file
    $itemsToAdd = 0; # columns added by this file
    foreach ($line in $content)
    {
        if ($line -match "^(?<group>\w+),(?<value>.*)") 
        { 
            $group = $matches["group"]
            if (-not $items.ContainsKey($group)) 
            {   # in case the row doesn't exists add and fill with empty columns
                $items.Add($group, @()) 
                for($i = 0; $i -lt $colCount; $i++) { $items[$group] += "" }
            }

            # add new values to correct row
            $matches["value"].Split(",") | foreach { $items[$group] += $_ }
            $itemsToAdd = ($matches["value"].Split(",") | measure).Count # saves col count
        } 
    }

    # in case that file didn't contain some row, add empty cols for those rows
    $colCount += $itemsToAdd
    $toAddEmpty = @()
    $items.Keys | ? { (($items[$_] | measure).Count -lt $colCount) } | foreach { $toAddEmpty += $_ }
    foreach ($key in $toAddEmpty) 
    {   
        for($i = 0; $i -lt $itemsToAdd; $i++) { $items[$key] += "" }
    }
}

# output
Remove-Item "output.csv" -ea 0
foreach ($key in $items.Keys)
{
    "$key,{0}" -f [string]::Join(",", $items[$key]) | Add-Content "output.csv"
}

以下是我的Perl版本：

use strict;
use warnings;

my $filenum = 0;

my ( %fruits, %data );
foreach my $file ( sort glob("*.csv") ) {

    $filenum++;
    open my $fh, "<", $file or die $!;

    while ( my $line = <$fh> ) {

        chomp $line;

        my ( $fruit, @values ) = split /,/, $line;

        $fruits{$fruit} = 1;

        $data{$filenum}{$fruit} = \@values;
    }

    close $fh;
}
foreach my $fruit ( sort keys %fruits ) {

    print $fruit, ",", join( ",", map { $data{$_}{$fruit} ? @{ $data{$_}{$fruit} } : ",," } 1 .. $filenum ), "\n";
}

所以你对葡萄有输入错误吗或者我误解了什么？
好的，gangabass解决方案有效，比我的更酷，但我还是要加上我的。它稍微严格一些，并且保留了一个也可以使用的数据结构。所以，享受吧

使用严格；使用警告； opendir我的$dir，'.'或死$！； my@csv=grep（/^\d+\.csv$/i，readdir$dir）； closedir$dir； #根据文件名中的前导数字进行数字排序 @csv=sort{（$a=~/^（\d+/）[0]（$b=~/^（\d+）/）[0]}@csv；我的%数据； #要打印空记录，我们首先需要知道所有的名字对于我的$file（@csv）{ 打开我的$fh，“Powershell: $produce = "apples","grape","orange","pear" $produce_hash = @{} $produce | foreach-object {$produce_hash[$_] = @(,$_)} $myFiles = @(gci *.csv) | sort Name foreach ($file in $myFiles){ $file_hash = @{} $produce | foreach-object {$file_hash[$_] = @($null,$null,$null)} get-content $file | foreach-object{ $line = $_.split(",") $file_hash[$line[0]] = $line[1..3] } $produce | foreach-object { $produce_hash[$_] += $file_hash[$_] } } $ofs = "," $out = @() $produce | foreach-object { $out += [string]$produce_hash[$_] } $out | out-file "outputfile.csv" gc outputfile.csv apples,48,12,7,51,8,6,11,12,13,14,12,8 grape,,,,87,42,12,81,5,8,,, orange,22,6,1,,,,,,,5,7,9 pear,17,16,2,22,3,7,11,5,6,,, 对于其他项目应该很容易修改。只需将它们添加到$PRODUCT数组中。以下是一种更为谨慎的方法。但是，当项目丢失时，它仍然不会添加逗号 Get-ChildItem D:\temp\a\ *.csv | Get-Content | ForEach-Object -begin { $result=@{} } -process { $name, $otherCols = $_ -split '(?<=\w+),' if (!$result[$name]) { $result[$name] = @() } $result[$name] += $otherCols } -end { $result.GetEnumerator() | % { "{0},{1}" -f $_.Key, ($_.Value -join ",") } } | Sort Get ChildItem D:\temp\a\*.csv| 得到满足| ForEach对象-开始{$result=@{}-进程{ $name，$otherCols=$\分割'（？第二个Powershell解决方案（根据要求）看起来您希望数据按文件名排序。例如，您在2.csv 和3.csv 中的orange 中有空记录。如果这是一项要求，您应该将其添加到问题中。感谢PantikT的努力，非常感谢-请查看我对“我的问题”的更新以获得反馈，因为这不会退出e生成我正在寻找的输出。您的默认排序不会对9.csv 之外的文件名进行排序，因为11.csv 将出现在2.csv之前。感谢gangablass…是的，在我想要的输出中，我有一个输入错误，grape之后不应该立即有空格。已经更新。谢谢mjolinor，这可以修改吗你不需要手动在$PRODUCT数组中输入项目…因为可能事先不知道项目将是什么…可能。我认为有两种方法可以做到这一点：1-读取数据两次，使用第一次传递收集第一个元素的唯一值以构建$PRODUCT数组。2-设置计数器并在处理每个fi时递增因此，您知道您可能需要在为该项获得第一组值之前添加$null数组。哪一个最有效可能取决于数据文件的数量/大小。发布了第二个自动填充$product的解决方案 $produce = "apples","grape","orange","pear" $produce_hash = @{} $produce | foreach-object {$produce_hash[$_] = @(,$_)} $myFiles = @(gci *.csv) | sort Name foreach ($file in $myFiles){ $file_hash = @{} $produce | foreach-object {$file_hash[$_] = @($null,$null,$null)} get-content $file | foreach-object{ $line = $_.split(",") $file_hash[$line[0]] = $line[1..3] } $produce | foreach-object { $produce_hash[$_] += $file_hash[$_] } } $ofs = "," $out = @() $produce | foreach-object { $out += [string]$produce_hash[$_] } $out | out-file "outputfile.csv" gc outputfile.csv apples,48,12,7,51,8,6,11,12,13,14,12,8 grape,,,,87,42,12,81,5,8,,, orange,22,6,1,,,,,,,5,7,9 pear,17,16,2,22,3,7,11,5,6,,, Get-ChildItem D:\temp\a\ *.csv | Get-Content | ForEach-Object -begin { $result=@{} } -process { $name, $otherCols = $_ -split '(?<=\w+),' if (!$result[$name]) { $result[$name] = @() } $result[$name] += $otherCols } -end { $result.GetEnumerator() | % { "{0},{1}" -f $_.Key, ($_.Value -join ",") } } | Sort $produce = @() $produce_hash = @{} $file_count = -1 $myFiles = @(gci 0*.csv) | sort Name foreach ($file in $myFiles){ $file_count ++ $file_hash = @{} get-content $file | foreach-object{ $line = $_.split(",") if ($produce -contains $line[0]){ $file_hash[$line[0]] += $line[1..3] } else { $produce += $line[0] $file_hash[$line[0]] = @(,$line[0]) + (@($null) * 3 * $file_count) + $line[1..3] } } $produce | foreach-object { if ($file_hash[$_]){$produce_hash[$_] += $file_hash[$_]} else {$produce_hash[$_] += @(,$null) * 3} } } $ofs = "," $out = @() $produce_hash.keys | foreach-object { $out += [string]$produce_hash[$_] } $out | out-file "outputfile.csv" gc outputfile.csv apples,48,12,7,51,8,6,11,12,13,14,12,8 grape,,,,87,42,12,81,5,8,,, orange,22,6,1,,,,,,,5,7,9 pear,17,16,2,22,3,7,11,5,6,,,