Regex perl-从列名创建文件名_Regex_Perl_Filenames

Regex perl-从列名创建文件名

regex perl

Regex perl-从列名创建文件名,regex,perl,filenames,Regex,Perl,Filenames,我是Perl新手，我想根据输入文件中的列名创建输出文件的名称。假设我的输入文件头如下所示： #identifier (%)composition 我希望我的输出文件名是identifier\u composition。这些标识符和组合可以是字母数字字符序列，例如用于标识符的#E2FAR4，或用于组合的（%）MhDE4。对于本例，输出文件名应为E2FAR4\u MhDE4。到目前为止，我能够获得标识符，但无法获得组合。这是我尝试过的代码： if ($line =~ /^#\s*(\S+)\

我是Perl新手，我想根据输入文件中的列名创建输出文件的名称。假设我的输入文件头如下所示：

#identifier    (%)composition

我希望我的输出文件名是

identifier\u composition

。这些

标识符

和

组合

可以是字母数字字符序列，例如用于标识符的

#E2FAR4

，或用于组合的

（%）MhDE4

。对于本例，输出文件名应为

E2FAR4\u MhDE4

。到目前为止，我能够获得

标识符

，但无法获得

组合

。这是我尝试过的代码：

if ($line =~ /^#\s*(\S+)\t\(%)s*(\S+)/){
    my $ID = $1;
    my $comp = $2;
    my $out_file = "${ID}_${comp}"
}

但是我得到的

标识符也作为第二个参数。任何帮助都将不胜感激。
使用下面的正则表达式
^#\s*(\S+)\t\(%\)(\S+)


示例代码：
#!/usr/bin/perl
use strict;
use warnings;
while(<DATA>){
    my $line = $_;
    chomp $line;
    if ($line =~ /^#\s*(\S+)\t\(%\)(\S+)/){
        my $ID = $1;
        my $comp = $2;
        my $out_file = "${ID}_${comp}";
        print "Filename: $out_file";
    }
}

__DATA__
#identifier (%)composition

在正则表达式下使用
^#\s*(\S+)\t\(%\)(\S+)


示例代码：
#!/usr/bin/perl
use strict;
use warnings;
while(<DATA>){
    my $line = $_;
    chomp $line;
    if ($line =~ /^#\s*(\S+)\t\(%\)(\S+)/){
        my $ID = $1;
        my $comp = $2;
        my $out_file = "${ID}_${comp}";
        print "Filename: $out_file";
    }
}

__DATA__
#identifier (%)composition

看来你对你的正则表达式想得太多了。您正在查找由一些非单词字符分隔的两个单词字符序列
if ($line =~ /(\w+)\W+(\w+)/) {
  say "$1 / $2";
}

更简单的方法是匹配所有单词字符序列：
if (my @words = $line =~ /(\w+)/g) {
  say join ' / ', @words;
}

更新：我把你的正则表达式放在这里。结果如下：
NODE                     EXPLANATION
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  #                        '#'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \t                       '\t' (tab)
--------------------------------------------------------------------------------
  \^                       '^'
--------------------------------------------------------------------------------
  (                        group and capture to \2:
--------------------------------------------------------------------------------
    %                        '%'
--------------------------------------------------------------------------------
  )                        end of \2
--------------------------------------------------------------------------------
  s*                       's' (0 or more times (matching the most
                           amount possible))
--------------------------------------------------------------------------------
  (                        group and capture to \3:
--------------------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )                        end of \3

<> p>我认为你最大的问题是在正则表达式中间试图匹配的文字“代码> ^ <代码>，但是<<代码> %中的未逃脱括号也是一个问题。而且s*
是毫无意义和令人困惑的：-）
看起来你对正则表达式的思考太多了。您正在查找由一些非单词字符分隔的两个单词字符序列
if ($line =~ /(\w+)\W+(\w+)/) {
  say "$1 / $2";
}

更简单的方法是匹配所有单词字符序列：
if (my @words = $line =~ /(\w+)/g) {
  say join ' / ', @words;
}

更新：我把你的正则表达式放在这里。结果如下：
NODE                     EXPLANATION
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  #                        '#'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \t                       '\t' (tab)
--------------------------------------------------------------------------------
  \^                       '^'
--------------------------------------------------------------------------------
  (                        group and capture to \2:
--------------------------------------------------------------------------------
    %                        '%'
--------------------------------------------------------------------------------
  )                        end of \2
--------------------------------------------------------------------------------
  s*                       's' (0 or more times (matching the most
                           amount possible))
--------------------------------------------------------------------------------
  (                        group and capture to \3:
--------------------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )                        end of \3

<> p>我认为你最大的问题是在正则表达式中间试图匹配的文字“代码> ^ <代码>，但是<<代码> %中的未逃脱括号也是一个问题。而括号是正则表达式中的特殊字符，请将其转义。括号是正则表达式中的特殊字符，请将其转义。嗨。谢谢你详尽的答复。是的，在我的问题中，^
实际上是一个复制/粘贴错误。实际上不是我代码的一部分，但我把它放在那里，因为你的答案是指它。它确实适用于我给出的示例，但标识符和组合不一定是字面上的标识符
和组合
，它可以是一个字母数字字符序列，前面有一些特殊字符，并由制表符分隔。但再次感谢您的时间。@Marius：如果您想要一个有意义的答案，那么您的问题应该包括示例数据，以证明各种可能性：-）是的，您是对的。我已相应地编辑了问题文本。再一次谢谢你。谢谢你详尽的答复。是的，在我的问题中，^
实际上是一个复制/粘贴错误。实际上不是我代码的一部分，但我把它放在那里，因为你的答案是指它。它确实适用于我给出的示例，但标识符和组合不一定是字面上的标识符
和组合
，它可以是一个字母数字字符序列，前面有一些特殊字符，并由制表符分隔。但再次感谢您的时间。@Marius：如果您想要一个有意义的答案，那么您的问题应该包括示例数据，以证明各种可能性：-）是的，您是对的。我已相应地编辑了问题文本。再次感谢，谢谢。它可以工作，并且可以很容易地适应标识符
和组合
可能采用的任何值以及前面的（最终）特殊字符。谢谢。它可以工作，并且可以很容易地适应标识符
和组合
可能采用的任何值以及（最终）前面的特殊字符。