Regex 在perl中，为什么在模式上拆分字符串会在某些拆分点处给出空字符_Regex_Perl_Split

Regex 在perl中，为什么在模式上拆分字符串会在某些拆分点处给出空字符

regex perl

Regex 在perl中，为什么在模式上拆分字符串会在某些拆分点处给出空字符,regex,perl,split,Regex,Perl,Split,假设我有一个字符串，如下所示： my $line="(l_extendedprice*(1-l_discount)*(1+l_tax))"; 每当这个字符串得到一个非单词字符时，我就要分割它，我还想记住这个字符。下面是我的代码： my @split_on_non_word=split /(\W)/,$line; print scalar @split_on_non_word, "\n"; print "split:$_\n" for @split_on_non_word; 这是我的输出： 2

假设我有一个字符串，如下所示：

my $line="(l_extendedprice*(1-l_discount)*(1+l_tax))";

每当这个字符串得到一个非单词字符时，我就要分割它，我还想记住这个字符。下面是我的代码：

my @split_on_non_word=split /(\W)/,$line;
print scalar @split_on_non_word, "\n";
print "split:$_\n" for @split_on_non_word;

这是我的输出：

20
split:
split:(
split:l_extendedprice
split:*
split:
split:(
split:1
split:-
split:l_discount
split:)
split:
split:*
split:
split:(
split:1
split:+
split:l_tax
split:)
split:
split:)

15
split:(
split:l_extendedprice
split:*
split:(
split:1
split:-
split:l_discount
split:)
split:*
split:(
split:1
split:+
split:l_tax
split:)
split:)

问题是，我在数组中的模式中得到了空字符，例如，（，*）。我感觉可能与元字符有关。但是，在拆分“+”时，它不会插入任何空字符，而“+”也是元字符。非常感谢在这方面提供的任何帮助

当然，有一些方法可以对数组进行后期处理并去除空字符，这是我目前的工作。但是，我只是在寻找更好的解决方案

预期输出：

20
split:
split:(
split:l_extendedprice
split:*
split:
split:(
split:1
split:-
split:l_discount
split:)
split:
split:*
split:
split:(
split:1
split:+
split:l_tax
split:)
split:
split:)

15
split:(
split:l_extendedprice
split:*
split:(
split:1
split:-
split:l_discount
split:)
split:*
split:(
split:1
split:+
split:l_tax
split:)
split:)

您可以将单词边界

\b

与检查

\W

相结合，在这种情况下，您可以拆分空字符串，这只会将字符串转换为字符列表

my $line="(l_extendedprice*(1-l_discount)*(1+l_tax))";

my @split_on_non_word = map { /\W/ ? split '', $_ : $_ } split /\b/,$line;
print scalar @split_on_non_word, "\n";
print "split:$_\n" for @split_on_non_word;

输出：

15
split:(
split:l_extendedprice
split:*
split:(
split:1
split:-
split:l_discount
split:)
split:*
split:(
split:1
split:+
split:l_tax
split:)
split:)

您可以将单词边界

\b

与检查

\W

相结合，在这种情况下，您可以拆分空字符串，这只会将字符串转换为字符列表

my $line="(l_extendedprice*(1-l_discount)*(1+l_tax))";

my @split_on_non_word = map { /\W/ ? split '', $_ : $_ } split /\b/,$line;
print scalar @split_on_non_word, "\n";
print "split:$_\n" for @split_on_non_word;

输出：

15
split:(
split:l_extendedprice
split:*
split:(
split:1
split:-
split:l_discount
split:)
split:*
split:(
split:1
split:+
split:l_tax
split:)
split:)

split（）
use Data::Dumper;

my $line="(l_extendedprice*(1-l_discount)*(1+l_tax))";
my @split_on_non_word = $line =~ /(\w+|\W)/g;

print Dumper \@split_on_non_word;

输出
$VAR1 = [
      '(',
      'l_extendedprice',
      '*',
      '(',
      '1',
      '-',
      'l_discount',
      ')',
      '*',
      '(',
      '1',
      '+',
      'l_tax',
      ')',
      ')'
    ];

split（）
use Data::Dumper;

my $line="(l_extendedprice*(1-l_discount)*(1+l_tax))";
my @split_on_non_word = $line =~ /(\w+|\W)/g;

print Dumper \@split_on_non_word;

输出
$VAR1 = [
      '(',
      'l_extendedprice',
      '*',
      '(',
      '1',
      '-',
      'l_discount',
      ')',
      '*',
      '(',
      '1',
      '+',
      'l_tax',
      ')',
      ')'
    ];

您已将每个\W
字符声明为字段分隔符。字符串的第一个字符是（
），这意味着它必须将空字符串与后面的字符串分隔开
然后是*（
：两个分隔符的序列。这意味着它们之间必须有一个空字段
对于1+l\u tax
，很明显，在分隔符+
的两侧都有非空字符串
对我来说，过滤掉空字段似乎是最简单的：
#!/usr/bin/env perl

use strict;
use warnings;

use YAML::XS;

my $line = "(l_extendedprice*(1-l_discount)*(1+l_tax))";

my $tokens = [ grep length, (split /(\W)/, $line) ];

print scalar @$tokens, "\n";

print Dump $tokens;

输出：
15
split:(
split:l_extendedprice
split:*
split:(
split:1
split:-
split:l_discount
split:)
split:*
split:(
split:1
split:+
split:l_tax
split:)
split:)

15
---
- (
-延长价格
- '*'
- (
- '1'
- '-'
-l_折扣
- )
- '*'
- (
- '1'
- +
-卢布税
- )
- )
您已将每个\W
字符声明为字段分隔符。字符串的第一个字符是（
），这意味着它必须将空字符串与后面的字符分隔开
然后是*（
：两个分隔符的序列。这意味着它们之间必须有一个空字段
对于1+l\u tax
，很明显，在分隔符+
的两侧都有非空字符串
对我来说，过滤掉空字段似乎是最简单的：
#!/usr/bin/env perl

use strict;
use warnings;

use YAML::XS;

my $line = "(l_extendedprice*(1-l_discount)*(1+l_tax))";

my $tokens = [ grep length, (split /(\W)/, $line) ];

print scalar @$tokens, "\n";

print Dump $tokens;

输出：
15
split:(
split:l_extendedprice
split:*
split:(
split:1
split:-
split:l_discount
split:)
split:*
split:(
split:1
split:+
split:l_tax
split:)
split:)

15
---
- (
-延长价格
- '*'
- (
- '1'
- '-'
-l_折扣
- )
- '*'
- (
- '1'
- +
-卢布税
- )
- )
还有另一种方法：
在拆分模式中捕获很少能完全按照某些任务的需要工作。如果不能，则必须对结果进行后期处理，使用匹配而不是拆分，或者尝试提出一种不捕获的拆分模式来满足您的需要。其他答案采用前两种方法之一。对于第三种方法，您希望在不捕获的情况下进行拆分两边各有一个非单词字符，这很简单：
split /(?<=\W)|(?=\W)/

split/（？另一种方式：
在拆分模式中捕获很少能完全按照某些任务的需要工作。如果不能，则必须对结果进行后期处理，使用匹配而不是拆分，或者尝试提出一种不捕获的拆分模式来满足您的需要。其他答案采用前两种方法之一。对于第三种方法，您希望在不捕获的情况下进行拆分两边各有一个非单词字符，这很简单：
split /(?<=\W)|(?=\W)/

split/（？Perl中没有null
。粗略的等价物是undef
，这是一个未定义的值。空行就是空字符串。您可以使用\b
转义序列作为单词边界，但这将产生11个结果，因为带有（）
的内容被视为一个单词，就像）
和）*（
之后可以拆分它们。Perl中没有null
。粗略的等价物是undef
，这是一个未定义的值。空行就是空字符串。您可以使用\b
转义序列作为单词边界，但这将产生11个结果，因为（）
被视为一个词，如）
和）*（
。这些可能会被拆分。看起来有人刚好路过，对我们的两个答案都投了否决票。我不希望有任何解释。看起来有人刚好路过，对我们的两个答案都投了否决票。我不希望有任何解释。为什么要加载YAML？因为我喜欢它的转储
，比其他转储更好，我不喜欢我喜欢OP打印结果的方式。为什么要加载YAML？因为我更喜欢它的转储
，而我不喜欢OP打印结果的方式。