Linux 如何在字符串字段中使用多个逗号格式化.CSV文件的日期字段_Linux_Csv_Sed_Awk_Cut

Linux 如何在字符串字段中使用多个逗号格式化.CSV文件的日期字段

linux csv sed awk

Linux 如何在字符串字段中使用多个逗号格式化.CSV文件的日期字段,linux,csv,sed,awk,cut,Linux,Csv,Sed,Awk,Cut,我有一个.CSV文件（file.CSV），它的数据都用双引号括起来。文件的样本格式如下所示： column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10 "12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1, name","890","8

我有一个.CSV文件（file.CSV），它的数据都用双引号括起来。文件的样本格式如下所示：

column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1, name","890","88","11-OCT-11","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455","author2, name","12","455","12-OCT-11","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3, name","333","22","13-OCT-11","232"

第9个字段是格式为“DD-MMM-YY”的日期字段。我必须将其转换为格式YYYY/MM/DD。我试图使用下面的代码，但没有用

awk -F, '
 BEGIN {
 split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, " ")
 for (i=1; i<=12; i++) mdigit[month[i]]=i
 }
 { m=substr($9,4,3)
 $9 = sprintf("%02d/%02d/"20"%02d",mdigit[m],substr($9,1,2),substr($9,8,20))
 print
 }' OFS="," file.csv > temp_file.csv

据我所知，问题在于双引号中的逗号，因为我的代码也在考虑它们。。。请就以下问题提出建议：

1）所有字段中所有值的双引号是否有任何区别？如果它们有任何区别，我如何从所有值中除去它们，除了带逗号的字符串？

2）对我的代码进行任何修改，以便我可以格式化第9个字段，该字段的格式为“DD-MMM-YYYY”至yyy/MM/DD

您可以尝试此

awk

awk -F"\"" 'BEGIN { OFS="\"" }{ "date -d "$18" +%Y/%m/%d" | getline $18; print $0; }' yourfile.txt

输出：

"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1,name","890","88","2011/10/11","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455","author2,name","12","455","2011/10/12","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3,name","333","22","2011/10/13","232"

您可以尝试以下一种衬里：

awk '
BEGIN {
    FS = OFS = ","
    split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, / /)
    for (i=1; i<=12; i++) {
        mm[month[i]]=i
    }
}
NR>1 { 
    gsub(/\"/, "", $(NF-1))
    split($(NF-1), d, /-/)
    $(NF-1)=q "20" d[3] "/" mm[d[2]] "/" d[1] q}1' q='"' file

perl -MText::CSV_XS -E'$csv=Text::CSV_XS->new({eol=>"\n", allow_whitespace=>1});@m=qw(JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC);@m{@m}=(1 .. @m);while(my $row=$csv->getline(ARGV)){($d,$m,$y)=split("-",$row->[8]);$row->[8]=sprintf"%02d/%02d/%04d",$d,$m{$m},$y if $m{$m};$csv->print(STDOUT, $row)}' file.csv > temp_file.csv

我强烈建议您使用正确的CSV解析器。例如，在Perl中使用将正确且理智地完成工作。例如，这一班轮：

awk '
BEGIN {
    FS = OFS = ","
    split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, / /)
    for (i=1; i<=12; i++) {
        mm[month[i]]=i
    }
}
NR>1 { 
    gsub(/\"/, "", $(NF-1))
    split($(NF-1), d, /-/)
    $(NF-1)=q "20" d[3] "/" mm[d[2]] "/" d[1] q}1' q='"' file

perl -MText::CSV_XS -E'$csv=Text::CSV_XS->new({eol=>"\n", allow_whitespace=>1});@m=qw(JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC);@m{@m}=(1 .. @m);while(my $row=$csv->getline(ARGV)){($d,$m,$y)=split("-",$row->[8]);$row->[8]=sprintf"%02d/%02d/%04d",$d,$m{$m},$y if $m{$m};$csv->print(STDOUT, $row)}' file.csv > temp_file.csv

您可以从末尾开始计数：

NF-1

我会考虑使用一个设计用于处理CSV文件的程序——可能是这样。它具有内置的日期操作功能。@kev如何在上面给出的代码中使用

NF-1

？我不熟悉

linux

和

awk

@JonathanLeffler我试着做了

人csvfix

，但它没有给我任何这方面的手册页。。。请详细说明我如何才能让这个工作…你能解释一下代码吗？这里的“$18”代表什么？因为，当我使用您的代码时，我得到一个错误sh:+%Y/%m/%d:没有这样的文件或目录，并打印我在问题中提到的相同的输入文件内容