使用sed与CSV文件进行负匹配

使用sed与CSV文件进行负匹配,sed,Sed,我有以下格式的CSV文件: $ tail X.csv | sed 's/[a-zA-Z0-9]/X/g' XXXXXXX/XXXXXXXX XXXXXXXXXXXX), XXXXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXX (X),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX} XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XX

我有以下格式的CSV文件:

$ tail X.csv | sed 's/[a-zA-Z0-9]/X/g'
XXXXXXX/XXXXXXXX XXXXXXXXXXXX), XXXXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXX (X),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXXX (XXXXXXX XXXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXX XXXXXXXXX XXXXXX XXXX XXX XXXXXXXX XX XXXXXXX XXX XXXXXXXX XXXXXXX (XXXXXXXXX): XXXXXXXX X XXXXXXXXXX XXXX X XXXXXXXXXX.,XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXXX (XXXXXXX XXXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXXXX XXX XXXXXXXX XXX XXXXXXXX XXXXXXX (XXXXXXXXX) (XXXXXXX XXXXXXXXXXXXXX),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXX,XXXXXXXXX (XXXXXX XXXXXXX XXXXXXX),XXXXX,XX.XXX.XXX.XX,XXXX,XXXXXXXX XXXXXXXXX XXXXXXXX XXX XXXXXX XXXXXXX XXXXXXX (XXXXXXXXX).,XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXXXXXXX (XXXXXX XXXXXXXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXXXXX XXXX XXXXXXX(X) XX XX/XX/XXXX XXX XXXXXXX XXXXXXXX (XXXXXXXXX).,XXXXX,,X,X,X,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXX,XXXXXXXXXXX (XXXXXXX XXXXXXXXX),XXXXX,XX.XXX.XXX.XX,XXXX,XXXXXXX XXX XXXXXXXX XXX XXXXXX XXXXX (XXXXXXXXX) (XXXXXXXXXXX XX XXXXX XXX XXXXXXXX-XXXX XXXXXXXXXXX): XXXXXXXXXXXXXXXXXXX (XXXXX), XXXXXXXXXXXXXXXXXX (XXXXX), XXXXXXXXXXXXXX (XXXX), XXXXXXXXXXXXXXXX (XXXXX),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXXX (XXXXXXX XXXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXXXX XXX XXXXXXXX XXX XXXXXXXX XXXXXXX (XXXXXXXXX) (XXXXXXX XXXXXXXXXXXXXX),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXXXXXXX (XXXXXX XXXXXXXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXXXXX XXXX XXXXXXX(X) XX XX/XX/XXXX XXX XXXXX XXXXXXXX (XXXXXXXXX).,XXXXX,,X,X,X,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXXXXXXX (XXXXXXXX XXXXXXXXX),XXXXX,XX.XXX.XXX.XX,XXXX,XXXXXXX XXX XXXXXXXX XXX XXXXXXX XXXXX (XXXXXXXXX) (XXXXXXX XXXX): XXXXXXXXXXXXXX (XXXXX),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXX (XXXXXXXX XXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXXXX XXX XXXXXXXX XXX XXXXXX XXXXXXXX (XXXXXXXXX) (XXXXXXX XXXXXX XX XXXXXXXXXXX XXX XXXXXXXXXX XXXXX): XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXXXXXXXXXX (XXXXXX XXXXX XXXXXX XXXXXXXXXXXXX XXXX XXX XXXXX. XXX XX XXXX XXXXXX.), XXXXXXXXXXXXXXXXXXXXXXXXXXXX (XXX XX XXXX XXXXX XXX XXX XXXX XXXXXXX.),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
$ 
除了逗号分隔符之外,生成的CSV文件还包含逗号作为值的一部分,因此我需要
sed(1)
用另一个分隔符(如
|
)替换分隔符

不幸的是,无法重新生成文件(用其他内容替换分隔符)

我失败的尝试:

$ tail X.csv | sed 's/[a-zA-Z0-9]/X/g' | sed --regexp-extended '/,/!s/,%s/|/g' | tail -1 
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXX (XXXXXXXX XXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXXXX XXX XXXXXXXX XXX XXXXXX XXXXXXXX (XXXXXXXXX) (XXXXXXX XXXXXX XX XXXXXXXXXXX XXX XXXXXXXXXX XXXXX): XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXXXXXXXXXX (XXXXXX XXXXX XXXXXX XXXXXXXXXXXXX XXXX XXX XXXXX. XXX XX XXXX XXXXXX.), XXXXXXXXXXXXXXXXXXXXXXXXXXXX (XXX XX XXXX XXXXX XXX XXX XXXX XXXXXXX.),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
$ 
如何解决此问题?

使用:

sed -re 's/([^ ]),([^ ])/\1|\2/g'

我不喜欢
sed
,下面是使用
perl
的版本:

cat X.csv|perl-p-e“s/,(\s)/\$1/g”

这基本上意味着“将非空格字符后面的“|”替换为“,”序列”

或者这是使用
sed
的版本(应与POSIX兼容):


cat X.csv | sed-E's/,([^[:space:])/|\1/g'
<。。。在@nochkin的帮助下,我想出了
sed
解决方案:

$ tail -1 X.csv | sed 's/[a-zA-Z0-9]/X/g' | sed --regexp-extended 's/,(\S)/|\1/g' 
XXXXXXXXX|XXXX-XX-XX XX:XX:XX.XXXXXXXXX|XX|XXXXX|X|XXXXXX|X|XXXXXX|XXXXXXX (XXXXXXXX XXXXX)|XXXXX|XX.XXX.XXX.XX|XXXXX|XXXXXXX XXX XXXXXXXX XXX XXXXXX XXXXXXXX (XXXXXXXXX) (XXXXXXX XXXXXX XX XXXXXXXXXXX XXX XXXXXXXXXX XXXXX): XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXXXXXXXXXX (XXXXXX XXXXX XXXXXX XXXXXXXXXXXXX XXXX XXX XXXXX. XXX XX XXXX XXXXXX.), XXXXXXXXXXXXXXXXXXXXXXXXXXXX (XXX XX XXXX XXXXX XXX XXX XXXX XXXXXXX.)|XXXXX|,X|XXX|XXXXXXX|,|{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
$ sed --version
sed (GNU sed) 4.2.2
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Jay Fenlason, Tom Lord, Ken Pizzini,
and Paolo Bonzini.
GNU sed home page: <http://www.gnu.org/software/sed/>.
General help using GNU software: <http://www.gnu.org/gethelp/>.
E-mail bug reports to: <bug-sed@gnu.org>.
Be sure to include the word ``sed'' somewhere in the ``Subject:'' field.
$ 
$tail-1x.csv | sed's/[a-zA-Z0-9]/X/g'| sed--regexp extended's/,(\s)/|\1/g'
XXXXXXXX XXXX-XX-XX XX:XX:XX.XXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX(XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
$sed—版本
sed(GNU sed)4.2.2
版权所有(C)2012免费软件基金会。
许可证GPLv3+:GNU GPL版本3或更高版本。
这是自由软件:您可以自由更改和重新发布它。
在法律允许的范围内,不存在任何担保。
作者:杰伊·芬拉森、汤姆·洛德、肯·皮兹尼,
还有保罗·邦齐尼。
GNU sed主页:。
使用GNU软件的一般帮助:。
电子邮件错误报告至:。
请确保在“主题:”字段的某个位置包含单词“sed”。
$ 

1.您可能希望使用不同的字段分隔符或引号内的值生成csv文件。2.如果这不是一个选项,请提供更多信息:每行中的第二个逗号是否在字段值内?如果不是:我们如何找到需要修复的行?1)不幸的是,这不是一个选项,2)文件太大,我不相信每一行都有,但这在这个文件中很常见。@alexus,从你的文件中多显示几行,两行对否决投票的人来说是不够的,请用注释说明为什么否则我无法改进我的问题。@peter mortensen,谢谢编辑!没关系。但是我会使用
-r
而不是
-regexp-extended
,使它在BusyBox和*BSD的sed上工作。在MacOS X和旧版本的Unix上,它将是
-E
。GNU仍然支持这个选项。虽然这个代码片段可以解决这个问题,但它确实有助于提高文章的质量。请记住,您将在将来回答读者的问题,这些人可能不知道您的代码建议的原因。还请尽量不要用解释性注释挤满你的代码,这会降低代码和解释的可读性!