Regex 除了CSV中的一列之外,如何删除所有列的前导和尾随空格?

Regex 除了CSV中的一列之外,如何删除所有列的前导和尾随空格?,regex,perl,Regex,Perl,我有一个CSV,看起来像这样: things,ID,hello_field,more things stuff,123 ,hello ,more stuff stuff,123 ,hello ,more stuff stuff ,123 ,hello ,more stuff stuff,123 ,hello ,more stuff stuff ,123,hello ,more stuff stuff,123,hello ,more stuff stuff ,123,hello ,more

我有一个CSV,看起来像这样:

things,ID,hello_field,more things
stuff,123  ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123  ,hello ,more stuff
stuff,123  ,hello ,more stuff
stuff ,123,hello ,more stuff
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff
things,ID,hello_field,more things
stuff,123  ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123  ,hello,more stuff
stuff,123  ,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
如何从除第二列(
ID
)之外的所有列中删除前导和尾随空格?最终输出如下所示:

things,ID,hello_field,more things
stuff,123  ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123  ,hello ,more stuff
stuff,123  ,hello ,more stuff
stuff ,123,hello ,more stuff
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff
things,ID,hello_field,more things
stuff,123  ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123  ,hello,more stuff
stuff,123  ,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
我尝试使用下面的正则表达式,但它会从所有字段中删除空格,包括
ID
列中的字段

s/( +,|, +)/,/gi;

拆分、有选择地修剪、重新连接

perl -F, -lane 's/^\s+|\s+$//g for @F[0,2..$#F]; print join ",", @F' file.csv
说明: 开关

  • -F/pattern/
    split()
    模式用于
    -a
    开关(
    /
    是可选的)
  • -l
    :启用行结束处理
  • -a
    :在空间上拆分行并将其加载到数组中
    @F
  • -n
    :为输入文件中的每一行创建
    while(){…}
    循环
  • -e
    :告诉
    perl
    在命令行上执行代码
代码

  • EXPR for@F[0,2..$#F]
    :迭代数组切片(跳过第二个字段)
  • s/^\s+|\s+$//g
    :从字段中删除前导空格和尾随空格
  • 打印连接“,”,@F
    :打印结果
使用
awk
: 它的作用是:
  • 将字段分隔符和输出字段分隔符设置为
  • 遍历字段值。如果字段编号不是2,请修剪前导空格和尾随空格
  • 印刷品

虽然我已将内容存储在变量中,但您可以根据需要使用它。那么,试试这个:

#!/usr/bin/perl
use strict;
use Data::Dumper;

my $str="things,ID,hello_field,more things
stuff,123  ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123  ,hello ,more stuff
stuff,123  ,hello ,more stuff
stuff ,123,hello ,more stuff
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff";

$str=join("\n",map{my ($a,$b,$c)=($1,$2,$3) if($_=~/(.*?),(.*?),(.*)/is);$a=~s/^\s*|\s$//sg;$c=~s/\s*,\s*/,/sg;$_=join(",",$a,$b,$c);$_} split /\n/i,$str);

print $str;
输出:

things,ID,hello_field,more things
stuff,123  ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123  ,hello,more stuff
stuff,123  ,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff

可以指定替换中的每个字段:

#! /usr/bin/env perl
use warnings;
use strict;
use feature qw(say);

for my $line ( <DATA> ) {
    chomp $line;
    $line =~ s/^\s*(\S+)\s*,   # Things: trim off the spaces
        (.+?),                # ID: Leave alone
        \s*(\S+)\s*,          # Hello Field: trim off spaces
        \s*(\S+)\s*           # More things: trim off spaces
        /$1,$2,$3,$4/x;
    say $line;
}

__DATA__
things,ID,hello_field,more things
stuff,123  ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123  ,hello ,more stuff
stuff,123  ,hello ,more stuff
stuff ,123,hello ,more stuff   
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff
我正在考虑使用命名的捕获组。如果你在四处移动东西,并且有很多捕捉小组,它们就很好了。但是,在这种情况下,我不认为这会让阅读变得更容易:

#! /usr/bin/env perl
use warnings;
use strict;
use feature qw(say);

for my $line ( <DATA> ) {
    chomp $line;
    $line =~ s/^\s*(?<things>\S+)\s*,       # Things: trim off the spaces
        (?<id>.+?),                         # ID: Leave alone
        \s*(?<hello_field>\S+)\s*,          # Hello Field: trim off spaces
        \s*(?<more_things>\S+)\s*           # More things: trim off spaces
        /$+{things},$+{id},$+{hello_field},$+{more_things}/x;
    say $line;
}

__DATA__
things,ID,hello_field,more things
stuff,123  ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123  ,hello ,more stuff
stuff,123  ,hello ,more stuff
stuff ,123,hello ,more stuff   
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff
#/usr/bin/env perl
使用警告;
严格使用;
使用特征qw(例如);
对于我的$行(){
chomp$行;
$line=~s/^\s*(?\s+)\s*,#事物:修剪空间
(?。+),#ID:别管它
\s*(?\s+)\s*,#Hello字段:修剪空格
\s*(?\s+)\s*#更多内容:修剪空间
/$+{things}、$+{id}、$+{hello\u field}、$+{more\u things}/x;
比如说$line;
}
__资料__
事情,身份证,你好,现场,更多事情
东西,123,你好,更多东西
东西,123,你好,更多东西
东西,123,你好,更多东西
东西,123,你好,更多东西
东西,123,你好,更多东西
东西,123,你好,更多东西
东西,123,你好,更多东西

我更喜欢@Miller的答案,它使用正则表达式作为OP请求,但在需要时也有:

perl-MText::Trim-F,-anE'Trim表示@F[0,2..$#F];说join“,”,@F'test.csv

或:


(与wantarray等一起)其中as
String::Util
trim
sub是不同的。。。也许这在这里有用;-)

因此@miller您提供的语法“@F[0,2..$#F]”将首先执行替换,跳过“1”,转到“2”,然后“$#F”将在文件末尾的所有字段上执行替换?我想您已经大致了解了-让我们看看@miller说了什么:-)同时查看POD-特别是autosplit部分。Cheers@miller,还有一件事…它工作得很好,但是它没有返回分隔符,而是返回其中一个…标量(0x1dc19b4)有什么方法可以解决这个问题吗?有趣的一个线性事实:使用perl 5.20时,使用
-F
会自动打开
-an
,因此您可以直接使用
-F,-le
-升级perls的另一个原因!我很高兴你发布了这个@Miller。很长一段时间以来,我一直认为@a的
s/\a\s+\s+\z//g比@a的
do{s/\a\s+/;s/\s+\z//g}慢。我刚刚做了台架,而且交替操作实际上比双重操作快50%(即,第一次剥离三个字符串,第二次剥离两个字符串),所以要么regex引擎在这方面受到了很多的喜爱和关注,要么我梦见了我最初的假设。两者皆有可能
#! /usr/bin/env perl
use warnings;
use strict;
use feature qw(say);

for my $line ( <DATA> ) {
    chomp $line;
    $line =~ s/^\s*(?<things>\S+)\s*,       # Things: trim off the spaces
        (?<id>.+?),                         # ID: Leave alone
        \s*(?<hello_field>\S+)\s*,          # Hello Field: trim off spaces
        \s*(?<more_things>\S+)\s*           # More things: trim off spaces
        /$+{things},$+{id},$+{hello_field},$+{more_things}/x;
    say $line;
}

__DATA__
things,ID,hello_field,more things
stuff,123  ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123  ,hello ,more stuff
stuff,123  ,hello ,more stuff
stuff ,123,hello ,more stuff   
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff
use Text::Trim;
for (<>){
  my @line = split(/,/);
  trim for @line[0,2..$#line];
  print join",", @line, "\n";
}
s/\A\s+//; s/\s+\z// ;