Powershell/Perl:将多个CSV文件合并为一个?
我有以下CSV文件,我想把它们合并成一个CSV文件 01.csvPowershell/Perl:将多个CSV文件合并为一个?,perl,powershell,Perl,Powershell,我有以下CSV文件,我想把它们合并成一个CSV文件 01.csv apples,48,12,7 pear,17,16,2 orange,22,6,1 02.csv apples,51,8,6 grape,87,42,12 pear,22,3,7 03.csv apples,11,12,13 grape,81,5,8 pear,11,5,6 04.csv apples,14,12,8 orange,5,7,9 期望输出: apples,48,12,7,51,8,6,11,12,13,14,
apples,48,12,7
pear,17,16,2
orange,22,6,1
02.csv
apples,51,8,6
grape,87,42,12
pear,22,3,7
03.csv
apples,11,12,13
grape,81,5,8
pear,11,5,6
04.csv
apples,14,12,8
orange,5,7,9
期望输出:
apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,87,42,12,81,5,8,,,
pear,17,16,2,22,3,7,11,5,6,,,
orange,22,6,1,,,,,,5,7,9
有人能就如何实现这一目标提供指导吗?最好使用Powershell,但如果更容易的话,可以使用Perl等替代方法
感谢Pantik,您的代码输出接近我想要的:
apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,87,42,12,81,5,8
orange,22,6,1,5,7,9
pear,17,16,2,22,3,7,11,5,6
不幸的是,我需要在CSV文件中不存在条目时使用“占位符”逗号,例如橙色,22,6,1,,,,,,,,,,5,7,9,而不是橙色,22,6,1,5,7,9
更新:我希望这些文件按文件名的顺序进行解析,例如:
$myFiles = @(gci *.csv) | sort Name
foreach ($file in $myFiles){
问候
ted你必须解析文件,我看不出更简单的方法 powershell中的解决方案:
$produce = "apples","grape","orange","pear"
$produce_hash = @{}
$produce | foreach-object {$produce_hash[$_] = @(,$_)}
$myFiles = @(gci *.csv) | sort Name
foreach ($file in $myFiles){
$file_hash = @{}
$produce | foreach-object {$file_hash[$_] = @($null,$null,$null)}
get-content $file | foreach-object{
$line = $_.split(",")
$file_hash[$line[0]] = $line[1..3]
}
$produce | foreach-object {
$produce_hash[$_] += $file_hash[$_]
}
}
$ofs = ","
$out = @()
$produce | foreach-object {
$out += [string]$produce_hash[$_]
}
$out | out-file "outputfile.csv"
gc outputfile.csv
apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,,87,42,12,81,5,8,,,
orange,22,6,1,,,,,,,5,7,9
pear,17,16,2,22,3,7,11,5,6,,,
更新:好的,调整了一点-希望可以理解
$items = @{}
$colCount = 0 # total amount of columns
# loop through all files
foreach ($file in (gci *.csv | sort Name))
{
$content = Get-Content $file
$itemsToAdd = 0; # columns added by this file
foreach ($line in $content)
{
if ($line -match "^(?<group>\w+),(?<value>.*)")
{
$group = $matches["group"]
if (-not $items.ContainsKey($group))
{ # in case the row doesn't exists add and fill with empty columns
$items.Add($group, @())
for($i = 0; $i -lt $colCount; $i++) { $items[$group] += "" }
}
# add new values to correct row
$matches["value"].Split(",") | foreach { $items[$group] += $_ }
$itemsToAdd = ($matches["value"].Split(",") | measure).Count # saves col count
}
}
# in case that file didn't contain some row, add empty cols for those rows
$colCount += $itemsToAdd
$toAddEmpty = @()
$items.Keys | ? { (($items[$_] | measure).Count -lt $colCount) } | foreach { $toAddEmpty += $_ }
foreach ($key in $toAddEmpty)
{
for($i = 0; $i -lt $itemsToAdd; $i++) { $items[$key] += "" }
}
}
# output
Remove-Item "output.csv" -ea 0
foreach ($key in $items.Keys)
{
"$key,{0}" -f [string]::Join(",", $items[$key]) | Add-Content "output.csv"
}
以下是我的Perl版本:
use strict;
use warnings;
my $filenum = 0;
my ( %fruits, %data );
foreach my $file ( sort glob("*.csv") ) {
$filenum++;
open my $fh, "<", $file or die $!;
while ( my $line = <$fh> ) {
chomp $line;
my ( $fruit, @values ) = split /,/, $line;
$fruits{$fruit} = 1;
$data{$filenum}{$fruit} = \@values;
}
close $fh;
}
foreach my $fruit ( sort keys %fruits ) {
print $fruit, ",", join( ",", map { $data{$_}{$fruit} ? @{ $data{$_}{$fruit} } : ",," } 1 .. $filenum ), "\n";
}
所以你对葡萄有输入错误吗或者我误解了什么?好的,gangabass解决方案有效,比我的更酷,但我还是要加上我的。它稍微严格一些,并且保留了一个也可以使用的数据结构。所以,享受吧
使用严格;
使用警告;
opendir我的$dir,'.'或死$!;
my@csv=grep(/^\d+\.csv$/i,readdir$dir);
closedir$dir;
#根据文件名中的前导数字进行数字排序
@csv=sort{($a=~/^(\d+/)[0]($b=~/^(\d+)/)[0]}@csv;
我的%数据;
#要打印空记录,我们首先需要知道所有的名字
对于我的$file(@csv){
打开我的$fh,“Powershell:
$produce = "apples","grape","orange","pear"
$produce_hash = @{}
$produce | foreach-object {$produce_hash[$_] = @(,$_)}
$myFiles = @(gci *.csv) | sort Name
foreach ($file in $myFiles){
$file_hash = @{}
$produce | foreach-object {$file_hash[$_] = @($null,$null,$null)}
get-content $file | foreach-object{
$line = $_.split(",")
$file_hash[$line[0]] = $line[1..3]
}
$produce | foreach-object {
$produce_hash[$_] += $file_hash[$_]
}
}
$ofs = ","
$out = @()
$produce | foreach-object {
$out += [string]$produce_hash[$_]
}
$out | out-file "outputfile.csv"
gc outputfile.csv
apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,,87,42,12,81,5,8,,,
orange,22,6,1,,,,,,,5,7,9
pear,17,16,2,22,3,7,11,5,6,,,
对于其他项目应该很容易修改。只需将它们添加到$PRODUCT数组中。以下是一种更为谨慎的方法。但是,当项目丢失时,它仍然不会添加逗号
Get-ChildItem D:\temp\a\ *.csv |
Get-Content |
ForEach-Object -begin { $result=@{} } -process {
$name, $otherCols = $_ -split '(?<=\w+),'
if (!$result[$name]) { $result[$name] = @() }
$result[$name] += $otherCols
} -end {
$result.GetEnumerator() | % {
"{0},{1}" -f $_.Key, ($_.Value -join ",")
}
} | Sort
Get ChildItem D:\temp\a\*.csv|
得到满足|
ForEach对象-开始{$result=@{}-进程{
$name,$otherCols=$\分割'(?第二个Powershell解决方案(根据要求)
看起来您希望数据按文件名排序。例如,您在2.csv
和3.csv
中的orange
中有空记录。如果这是一项要求,您应该将其添加到问题中。感谢PantikT的努力,非常感谢-请查看我对“我的问题”的更新以获得反馈,因为这不会退出e生成我正在寻找的输出。您的默认排序不会对9.csv
之外的文件名进行排序,因为11.csv
将出现在2.csv
之前。感谢gangablass…是的,在我想要的输出中,我有一个输入错误,grape之后不应该立即有空格。已经更新。谢谢mjolinor,这可以修改吗你不需要手动在$PRODUCT数组中输入项目…因为可能事先不知道项目将是什么…可能。我认为有两种方法可以做到这一点:1-读取数据两次,使用第一次传递收集第一个元素的唯一值以构建$PRODUCT数组。2-设置计数器并在处理每个fi时递增因此,您知道您可能需要在为该项获得第一组值之前添加$null数组。哪一个最有效可能取决于数据文件的数量/大小。发布了第二个自动填充$product的解决方案
$produce = "apples","grape","orange","pear"
$produce_hash = @{}
$produce | foreach-object {$produce_hash[$_] = @(,$_)}
$myFiles = @(gci *.csv) | sort Name
foreach ($file in $myFiles){
$file_hash = @{}
$produce | foreach-object {$file_hash[$_] = @($null,$null,$null)}
get-content $file | foreach-object{
$line = $_.split(",")
$file_hash[$line[0]] = $line[1..3]
}
$produce | foreach-object {
$produce_hash[$_] += $file_hash[$_]
}
}
$ofs = ","
$out = @()
$produce | foreach-object {
$out += [string]$produce_hash[$_]
}
$out | out-file "outputfile.csv"
gc outputfile.csv
apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,,87,42,12,81,5,8,,,
orange,22,6,1,,,,,,,5,7,9
pear,17,16,2,22,3,7,11,5,6,,,
Get-ChildItem D:\temp\a\ *.csv |
Get-Content |
ForEach-Object -begin { $result=@{} } -process {
$name, $otherCols = $_ -split '(?<=\w+),'
if (!$result[$name]) { $result[$name] = @() }
$result[$name] += $otherCols
} -end {
$result.GetEnumerator() | % {
"{0},{1}" -f $_.Key, ($_.Value -join ",")
}
} | Sort
$produce = @()
$produce_hash = @{}
$file_count = -1
$myFiles = @(gci 0*.csv) | sort Name
foreach ($file in $myFiles){
$file_count ++
$file_hash = @{}
get-content $file | foreach-object{
$line = $_.split(",")
if ($produce -contains $line[0]){
$file_hash[$line[0]] += $line[1..3]
}
else {
$produce += $line[0]
$file_hash[$line[0]] = @(,$line[0]) + (@($null) * 3 * $file_count) + $line[1..3]
}
}
$produce | foreach-object {
if ($file_hash[$_]){$produce_hash[$_] += $file_hash[$_]}
else {$produce_hash[$_] += @(,$null) * 3}
}
}
$ofs = ","
$out = @()
$produce_hash.keys | foreach-object {
$out += [string]$produce_hash[$_]
}
$out | out-file "outputfile.csv"
gc outputfile.csv
apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,,87,42,12,81,5,8,,,
orange,22,6,1,,,,,,,5,7,9
pear,17,16,2,22,3,7,11,5,6,,,