Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/powershell/12.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Powershell-将日志文件转换为CSV_Powershell_Csv - Fatal编程技术网

Powershell-将日志文件转换为CSV

Powershell-将日志文件转换为CSV,powershell,csv,Powershell,Csv,我有这样的日志文件 2009-12-18T08:25:22.983Z 1 174 dns:0-apr-credit-cards-uk.pedez.co.uk P http://0-apr-credit-cards-uk.pedez.co.uk/ text/dns #170 20091218082522021+89 sha1:AIDBQOKOYI7OPLVSWEBTIAFVV7SRMLMF - - 2009-12-18T08:25:22.984Z 1

我有这样的日志文件

2009-12-18T08:25:22.983Z     1         174 dns:0-apr-credit-cards-uk.pedez.co.uk P http://0-apr-credit-cards-uk.pedez.co.uk/ text/dns #170 20091218082522021+89 sha1:AIDBQOKOYI7OPLVSWEBTIAFVV7SRMLMF - -
2009-12-18T08:25:22.984Z     1           5 dns:0-60racing.co.uk P http://0-60racing.co.uk/ text/dns #116 20091218082522037+52 sha1:WMII7OOKYQ42G6XPITMHJSMLQFLGCGMG - -
2009-12-18T08:25:23.066Z     1          79 dns:0-addiction.metapress.com.wam.leeds.ac.uk P http://0-addiction.metapress.com.wam.leeds.ac.uk/ text/dns #042 20091218082522076+20 sha1:NSUQN6TBIECAP5VG6TZJ5AVY34ANIC7R - -
...plus millions of other records
我需要将这些转换为csv文件

"2009-12-18T08:25:22.983Z","1","174","dns:0-apr-credit-cards-uk.pedez.co.uk","P","http://0-apr-credit-cards-uk.pedez.co.uk/","text/dns","#170","20091218082522021+89","sha1:AIDBQOKOYI7OPLVSWEBTIAFVV7SRMLMF","-","-"
"2009-12-18T08:25:22.984Z","1","5","dns:0-60racing.co.uk","P","http://0-60racing.co.uk/","text/dns","#116","20091218082522037+52","sha1:WMII7OOKYQ42G6XPITMHJSMLQFLGCGMG","-","-"
"2009-12-18T08:25:23.066Z","1","79","dns:0-addiction.metapress.com.wam.leeds.ac.uk","P","http://0-addiction.metapress.com.wam.leeds.ac.uk/","text/dns","#042","20091218082522076+20","sha1:NSUQN6TBIECAP5VG6TZJ5AVY34ANIC7R","-","-"
字段分隔符可以是单个或多个空格字符,具有固定宽度和可变宽度字段。这往往会混淆我发现的大多数CSV解析器

最后,我希望将这些文件bcp到SQL Server中,但您只能指定一个字符作为字段分隔符(即“.”),这会破坏固定长度字段

到目前为止,我正在使用PowerShell

gc -ReadCount 10 -TotalCount 200 .\crawl_sample.log | foreach { ([regex]'([\S]*)\s+').matches($_) } | foreach {$_.Groups[1].Value}
这将返回一个字段流:

2009-12-18T08:25:22.983Z
1
74
dns:0-apr-credit-cards-uk.pedez.co.uk
P
http://0-apr-credit-cards-uk.pedez.co.uk/
text/dns
#170
20091218082522021+89
sha1:AIDBQOKOYI7OPLVSWEBTIAFVV7SRMLMF
-
-
2009-12-18T08:25:22.984Z
1
55
dns:0-60racing.co.uk
P
http://0-60racing.co.uk/
text/dns
#116
20091218082522037+52
sha1:WMII7OOKYQ42G6XPITMHJSMLQFLGCGMG
-

但是如何将输出转换为CSV格式呢

再次回答我自己的问题

measure-command {
    $q = [regex]" +"
    $q.Replace( ([string]::join([environment]::newline, (Get-Content -ReadCount 1 \crawl_sample2.log))), "," ) > crawl_sample2.csv
}
而且很快

意见:

  • 我使用了
    \s+
    作为正则表达式分隔符,这是在中断换行符
  • Get Content-ReadCount 1
    将单行数组流式传输到正则表达式
  • 然后通过管道将输出字符串传输到新文件
更新

此脚本可以工作,但在处理大型文件时使用了大量RAM。所以,如果不使用8GB的RAM和交换,我怎么能做同样的事情呢

我认为这是由于
join
再次缓冲所有数据造成的。。。。有什么想法吗

更新2

好的-有更好的解决方案

Get-Content -readcount 100 -totalcount 100000 .\crawl.log | 
    ForEach-Object { $_ } |
       foreach { $_ -replace " +", "," } > .\crawl.csv

一个非常方便的Powershell指南-

您可能想看看我的FOSS CSV munging工具,我认为它可以做您想做的事情,但只能作为一个多阶段的过程…欢迎对脚本进行任何更好的解决方案或改进!您可以通过去掉中间的Foreach对象来简化这一点,因为-replace对字符串数组进行操作,例如
'ab'、'cd'、'ef'-replace'+'、'、'
。试试这个
gc crawl.log-read 100-total 100000 |%{$\u-replace'+',','}>crawl.csv
考虑到
-replace
,它可以更简单:
(gc crawl.log…-replace'+','
>crawl.csv(我的操作链)