Powershell ---------+ |3 |相对|导出Csv |导出Csv | 1000元素列表| 154.2秒| +--------+----------------------+----------------------+--------------+---------------------+-----------------+ |4a |连接路径|转换为Csv | StreamWriter |否| 425.0秒| +--------+----------------------+----------------------+--------------+---------------------+-----------------+ |4b |连接路径|转换为Csv | StreamWriter | 1000元素列表| 456.1秒| +--------+----------------------+----------------------+--------------+---------------------+-----------------+ |4c |连接路径|字符串插值| StreamWriter |否| 302.5秒| +--------+----------------------+----------------------+--------------+---------------------+-----------------+ |4d |连接路径|字符串插值| StreamWriter |否| 225.1秒| +--------+----------------------+----------------------+--------------+---------------------+-----------------+ |4e |[IO.Path]::Combine()|字符串插值| StreamWriter | No | 78.0秒| +--------+----------------------+----------------------+--------------+---------------------+-----------------+ |4f |字符串插值|字符串插值| StreamWriter |否| 77.7秒| +--------+----------------------+----------------------+--------------+---------------------+-----------------+

Powershell ---------+ |3 |相对|导出Csv |导出Csv | 1000元素列表| 154.2秒| +--------+----------------------+----------------------+--------------+---------------------+-----------------+ |4a |连接路径|转换为Csv | StreamWriter |否| 425.0秒| +--------+----------------------+----------------------+--------------+---------------------+-----------------+ |4b |连接路径|转换为Csv | StreamWriter | 1000元素列表| 456.1秒| +--------+----------------------+----------------------+--------------+---------------------+-----------------+ |4c |连接路径|字符串插值| StreamWriter |否| 302.5秒| +--------+----------------------+----------------------+--------------+---------------------+-----------------+ |4d |连接路径|字符串插值| StreamWriter |否| 225.1秒| +--------+----------------------+----------------------+--------------+---------------------+-----------------+ |4e |[IO.Path]::Combine()|字符串插值| StreamWriter | No | 78.0秒| +--------+----------------------+----------------------+--------------+---------------------+-----------------+ |4f |字符串插值|字符串插值| StreamWriter |否| 77.7秒| +--------+----------------------+----------------------+--------------+---------------------+-----------------+,powershell,csv,powershell-2.0,Powershell,Csv,Powershell 2.0,主要收获: 与导出Csv一起使用时,输出缓冲(1→ 2和1→ 3) 提供了巨大的性能改进 与StreamWriters一起使用时,输出缓冲(4a→ 4b)没有帮助,实际上会对性能造成小的影响 消除转换为Csv的(4a→ 4c)执行时间减少三分之一(153.6秒) 方法4a比缓冲的导出Csv方法慢得多,因为它引入了获取位置和连接路径的使用。要么这些cmdlet在幕后的处理比看上去要多得多,要么调用cmdlet通常比较慢(当然,如果调用了一百万次)。 将获取位置移动到每个对象的外部(4c→ 4d

主要收获:

  • 与导出Csv一起使用时,输出缓冲(1→ 2和1→ 3) 提供了巨大的性能改进
  • StreamWriter
    s一起使用时,输出缓冲(4a→ 4b)没有帮助,实际上会对性能造成小的影响
  • 消除转换为Csv的
    (4a→ 4c)执行时间减少三分之一(153.6秒)
  • 方法4a比缓冲的
    导出Csv
    方法慢得多,因为它引入了
    获取位置
    连接路径
    的使用。要么这些cmdlet在幕后的处理比看上去要多得多,要么调用cmdlet通常比较慢(当然,如果调用了一百万次)。
    • 获取位置
      移动到每个对象的
      外部
      (4c→ 4d)执行时间减少了四分之一(77.4秒)
    • 使用
      [System.IO.Path]::Combine()
      而不是
      连接路径
      (4d→ 4e)执行时间减少了三分之二(147.1秒)
  • 脚本优化既有趣又有教育意义
仅需确认,您需要将每个日期特定的CSV文件拆分为每个日期每个报价器的一个CSV文件?您是否正在寻找通用解决方案?您是否需要一个脚本来支持任意数量的文件,这些文件跨越任意数量的日期,并带有任意一组标记?或者只需执行一次即可将2个文件拆分为6个?如果是后者,我只需要用8行代码编写一个非常简单的代码就可以了。如果是前者,那就复杂多了。通用解决方案,因为我将有数百个CSV文件,每个股票代码、每个日期拆分一个CSV文件。
Group Object
需要将整个文件读取到内存中。对于大文件,您应该避免这样做。谢谢您,先生。对我的玩具档案来说,这简直是一种魅力。但是,实际的文件确实需要一些时间,所以我现在把这个打开。是的,它很慢。我正在制作一个改进版。现在测试一下。泰勒,如果性能是一个你可能想考虑的问题。不过,请注意,使用这种方法,您将需要自己处理很多事情,
导入Csv
导出Csv
通常会为您处理的事情。@AnsgarWiechers Ha,这就是我现在正在研究的解决方案。奇怪的是,只打开一个输出文件可以节省多少时间。我认为即使你坚持在文件中添加附件,差异也会很大。 Ticker Price Date AAPL 1 2017-08-14 AAPL 2 2017-08-14 AAPL 3 2017-08-14 AAPL 4 2017-08-14 MSFT 5 2017-08-14 MSFT 6 2017-08-14 MSFT 7 2017-08-14 GOOG 8 2017-08-14 GOOG 9 2017-08-14 ... Ticker Price Date AAPL 1 2017-08-13 AAPL 2 2017-08-13 AAPL 3 2017-08-13 AAPL 4 2017-08-13 MSFT 5 2017-08-13 MSFT 6 2017-08-13 MSFT 7 2017-08-13 GOOG 8 2017-08-13 GOOG 9 2017-08-13 ... Ticker Price Date AAPL 1 2017-08-14 AAPL 2 2017-08-14 AAPL 3 2017-08-14 AAPL 4 2017-08-14 Ticker Price Date MSFT 5 2017-08-14 MSFT 6 2017-08-14 MSFT 7 2017-08-14 Ticker Price Date GOOG 8 2017-08-14 GOOG 9 2017-08-14 Ticker Price Date AAPL 1 2017-08-13 AAPL 2 2017-08-13 AAPL 3 2017-08-13 AAPL 4 2017-08-13 Ticker Price Date MSFT 5 2017-08-13 MSFT 6 2017-08-13 MSFT 7 2017-08-13 Ticker Price Date GOOG 8 2017-08-13 GOOG 9 2017-08-13
Import-Csv file-2017-08-14.csv | Group-Object -Property "Ticker" | Foreach-Object {
    $path = $_.Name + ".csv";
    $_.Group | Export-Csv -Path $path -NoTypeInformation
}
Get-ChildItem -Filter '*.csv' -File -Force `
    | Select-Object -ExpandProperty 'FullName' `
    | Import-Csv -Delimiter "`t" `
    | ForEach-Object -Process {
        $outputFilePath = "out\{0}-{1}.csv" -f $_.Ticker, $_.Date;

        $_ | Export-Csv -Path $outputFilePath -Append -NoTypeInformation;
    };
$pendingRecordsByFilePath = @{};
$maxPendingRecordsPerFilePath = 1000;

Get-ChildItem -Filter '*.csv' -File -Force `
    | Select-Object -ExpandProperty 'FullName' `
    | Import-Csv -Delimiter "`t" `
    | ForEach-Object -Process {
        $outputFilePath = "out\{0}-{1}.csv" -f $_.Ticker, $_.Date;
        $pendingRecords = $pendingRecordsByFilePath[$outputFilePath];

        if ($pendingRecords -eq $null)
        {
            # This is the first time we're encountering this output file; create a new array
            $pendingRecords = @();
        }
        elseif ($pendingRecords.Length -ge $maxPendingRecordsPerFilePath)
        {
            # Flush all pending records for this output file
            $pendingRecords `
                | Export-Csv -Path $outputFilePath -Append -NoTypeInformation;
            $pendingRecords = @();
        }

        $pendingRecords += $_;
        $pendingRecordsByFilePath[$outputFilePath] = $pendingRecords;
    };

# No more input records to be read; flush all pending records for each output file
foreach ($outputFilePath in $pendingRecordsByFilePath.Keys)
{
    $pendingRecordsByFilePath[$outputFilePath] `
        | Export-Csv -Path $outputFilePath -Append -NoTypeInformation;
}
$pendingRecordsByFilePath = @{};
$maxPendingRecordsPerFilePath = 1000;

Get-ChildItem -Filter '*.csv' -File -Force `
    | Select-Object -ExpandProperty 'FullName' `
    | Import-Csv -Delimiter "`t" `
    | ForEach-Object -Process {
        $outputFilePath = "out\{0}-{1}.csv" -f $_.Ticker, $_.Date;
        $pendingRecords = $pendingRecordsByFilePath[$outputFilePath];

        if ($pendingRecords -eq $null)
        {
            # This is the first time we're encountering this output file; create a new list
            $pendingRecords = New-Object `
                -TypeName 'System.Collections.Generic.List[Object]' `
                -ArgumentList (,$maxPendingRecordsPerFilePath);
            $pendingRecordsByFilePath[$outputFilePath] = $pendingRecords;
        }
        elseif ($pendingRecords.Count -ge $maxPendingRecordsPerFilePath)
        {
            # Flush all pending records for this output file
            $pendingRecords `
                | Export-Csv -Path $outputFilePath -Append -NoTypeInformation;
            $pendingRecords.Clear();
        }
        $pendingRecords.Add($_);
    };

# No more input records to be read; flush all pending records for each output file
foreach ($outputFilePath in $pendingRecordsByFilePath.Keys)
{
    $pendingRecordsByFilePath[$outputFilePath] `
        | Export-Csv -Path $outputFilePath -Append -NoTypeInformation;
}
$truncateExistingOutputFiles = $true;
$outputFileWritersByPath = @{};

try
{
    Get-ChildItem -Filter '*.csv' -File -Force `
        | Select-Object -ExpandProperty 'FullName' `
        | Import-Csv -Delimiter "`t" `
        | ForEach-Object -Process {
            $outputFilePath = Join-Path -Path (Get-Location) -ChildPath ('out\{0}-{1}.csv' -f $_.Ticker, $_.Date);
            $outputFileWriter = $outputFileWritersByPath[$outputFilePath];
            $outputLines = $_ | ConvertTo-Csv -NoTypeInformation;

            if ($outputFileWriter -eq $null)
            {
                # This is the first time we're encountering this output file; create a new StreamWriter
                $outputFileWriter = New-Object `
                    -TypeName 'System.IO.StreamWriter' `
                    -ArgumentList ($outputFilePath, -not $truncateExistingOutputFiles, [System.Text.Encoding]::ASCII);

                $outputFileWritersByPath[$outputFilePath] = $outputFileWriter;

                # Write the header line
                $outputFileWriter.WriteLine($outputLines[0]);
            }

            # Write the data line
            $outputFileWriter.WriteLine($outputLines[1]);
        };
}
finally
{
    foreach ($writer in $outputFileWritersByPath.Values)
    {
        $writer.Close();
    }
}
$truncateExistingOutputFiles = $true;
$outputDirectoryPath = Join-Path -Path (Get-Location) -ChildPath 'out';
$outputFileWritersByPath = @{};

try
{
    Get-ChildItem -Filter '*.csv' -File -Force `
        | Select-Object -ExpandProperty 'FullName' `
        | Import-Csv -Delimiter "`t" `
        | ForEach-Object -Process {
            $outputFileName = '{0}-{1}.csv' -f $_.Ticker, $_.Date;
            $outputFilePath = [System.IO.Path]::Combine($outputDirectoryPath, $outputFileName);
            $outputFileWriter = $outputFileWritersByPath[$outputFilePath];

            if ($outputFileWriter -eq $null)
            {
                # This is the first time we're encountering this output file; create a new StreamWriter
                $outputFileWriter = New-Object `
                        -TypeName 'System.IO.StreamWriter' `
                        -ArgumentList ($outputFilePath, -not $truncateExistingOutputFiles, [System.Text.Encoding]::ASCII);

                $outputFileWritersByPath[$outputFilePath] = $outputFileWriter;

                # Write the header line
                $outputFileWriter.WriteLine('"Ticker","Price","Date"');
            }

            # Write the data line
            $outputFileWriter.WriteLine("""$($_.Ticker)"",""$($_.Price)"",""$($_.Date)""");
        };
}
finally
{
    foreach ($writer in $outputFileWritersByPath.Values)
    {
        $writer.Close();
    }
}
+--------+----------------------+----------------------+--------------+---------------------+-----------------+ | Method | Path handling | Line building | File writing | Output buffering | Execution time | +--------+----------------------+----------------------+--------------+---------------------+-----------------+ | 1 | Relative | Export-Csv | Export-Csv | No | 2,178.5 seconds | +--------+----------------------+----------------------+--------------+---------------------+-----------------+ | 2 | Relative | Export-Csv | Export-Csv | 1,000-element array | 222.9 seconds | +--------+----------------------+----------------------+--------------+---------------------+-----------------+ | 3 | Relative | Export-Csv | Export-Csv | 1,000-element List | 154.2 seconds | +--------+----------------------+----------------------+--------------+---------------------+-----------------+ | 4a | Join-Path | ConvertTo-Csv | StreamWriter | No | 425.0 seconds | +--------+----------------------+----------------------+--------------+---------------------+-----------------+ | 4b | Join-Path | ConvertTo-Csv | StreamWriter | 1,000-element List | 456.1 seconds | +--------+----------------------+----------------------+--------------+---------------------+-----------------+ | 4c | Join-Path | String interpolation | StreamWriter | No | 302.5 seconds | +--------+----------------------+----------------------+--------------+---------------------+-----------------+ | 4d | Join-Path | String interpolation | StreamWriter | No | 225.1 seconds | +--------+----------------------+----------------------+--------------+---------------------+-----------------+ | 4e | [IO.Path]::Combine() | String interpolation | StreamWriter | No | 78.0 seconds | +--------+----------------------+----------------------+--------------+---------------------+-----------------+ | 4f | String interpolation | String interpolation | StreamWriter | No | 77.7 seconds | +--------+----------------------+----------------------+--------------+---------------------+-----------------+