Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Regex 创建匹配括号-awk:sed_Regex_Awk_Sed - Fatal编程技术网

Regex 创建匹配括号-awk:sed

Regex 创建匹配括号-awk:sed,regex,awk,sed,Regex,Awk,Sed,我有一个数据集,它有三种模式: 第一: abrasion abrade:stem<>ion:suffix abstainer abstain:stem<>er:suffix abstention abstain:stem<>ion:suffix abrasion ((abrade:stem) ion:suffix) abstainer ((abstain:stem)er:suffix) abstention ((abstain:stem)ion:suffix

我有一个数据集,它有三种模式:

第一:

abrasion abrade:stem<>ion:suffix
abstainer abstain:stem<>er:suffix
abstention abstain:stem<>ion:suffix
abrasion ((abrade:stem) ion:suffix)
abstainer ((abstain:stem)er:suffix)
abstention ((abstain:stem)ion:suffix)
第二:

inaccurate (in:prefix(accurate:stem))
inactive (in:prefix(active:stem))
第三:

incommunicable (in:prefix ((communicate:stem)able:suffix))
incompatibility (in:prefix ((compatible:stem)ity:suffix))
我正在使用的代码是awk

{
    n = gsub(/<>/,")",$2)
    s = sprintf("%*s",n,"")
    gsub(/ /,"(",s)
    print "(" $1, s "((" $2 "))"
}

它没有产生示例中提到的预期输出。

模式1的预期输出可能有问题,括号没有配对,我猜是打字错误,应该是:

abrasion ((abrade:stem)ion:suffix)
abstainer ((abstain:stem)er:suffix)
abstention ((abstain:stem)ion:suffix)
我制作了这个awk脚本:

awk -v d="<>" '{$2="("$2")"}
$1~/^ab/{sub(d,")",$2);$2="(" $2}
$1~/^ina/{sub(d,"(",$2);$2=$2")"}
$1~/^inc/{sub(d,"((",$2);sub(d,")",$2);$2=$2")"}7' file

这应该足够一般,因为它考虑了匹配的
:stem
:prefix
:sufix

awk 'BEGIN{FS=OFS="\n"}{
  a=gensub(/([a-zA-Z]*):stem/,"(\\1:stem)", "g");
  b=gensub(/(\([a-zA-Z]*:stem\))<>([a-zA-Z]*):suffix/,"(\\1\\2:suffix)", "g", a);
  c=gensub(/([a-zA-Z]*:prefix)<>(.*)/,"(\\1\\2)", "g", b);
  print c;}' testfile
awk'BEGIN{FS=OFS=“\n”}{
a=gensub(/([a-zA-Z]*):stem/,“(\\1:stem)”,“g”);
b=gensub(/(\([a-zA-Z]*:stem\)([a-zA-Z]*):后缀/,“(\\1\\2:后缀)”,“g”,a);
c=gensub(/([a-zA-Z]*:前缀)(.*)/,“(\\1\\2)”,“g”,b);
打印c;}测试文件
此处演示:

编辑

这应该考虑多个后缀和前缀:

awk 'BEGIN{FS=OFS="\n"}{
   a=gensub(/([a-zA-Z]*):stem/,"(\\1:stem)", "g");
   while ( a ~ /stem)<>.*:suffix/) {
     a=gensub(/(\([a-zA-Z]*:stem\).*?)<>([a-zA-Z]*):suffix/,"(\\1\\2:suffix)", "g", a);
   }
   while ( a ~ /<>/) {
     a=gensub(/([a-zA-Z]*?:prefix)<>(.*)/,"(\\1\\2)", "g", a);
   }
   print a;}' test
awk'BEGIN{FS=OFS=“\n”}{
a=gensub(/([a-zA-Z]*):stem/,“(\\1:stem)”,“g”);
while(a~/词干)。*:后缀/){
a=gensub(/(\([a-zA-Z]*:stem\)*([a-zA-Z]*):后缀/,“(\\1\\2:后缀)”,“g”,a);
}
而(a~//){
a=gensub(/([a-zA-Z]*?:前缀)(.*)/,“(\\1\\2)”,“g”,a);
}
打印;}测试
此处演示:
(如果反民族主义不是一个词,那么很抱歉,但为了测试……)

awk
救命

$ awk 'function wrap(v) {return "("v")"; }
      {n=split($2,a,"<>"); 
       if(n==3) w=wrap(a[1] wrap(wrap(a[2]) a[3])); 
       else if(a[1]~/:prefix/) w=wrap(a[1] wrap(a[2])); 
       else w=wrap(wrap(a[1]) a[2]);
       print $1, w}' stems

abrasion ((abrade:stem)ion:suffix)
abstainer ((abstain:stem)er:suffix)
abstention ((abstain:stem)ion:suffix)
inaccurate (in:prefix(accurate:stem))
inactive (in:prefix(active:stem))
incommunicable (in:prefix((communicate:stem)able:suffix))
incompatibility (in:prefix((compatible:stem)ity:suffix))
$awk'函数包装(v){return”(“v”);}
{n=拆分($2,a,“”);
如果(n==3)w=wrap(a[1]wrap(wrap(a[2])a[3]);
如果(a[1]~/:prefix/)w=wrap(a[1]wrap(a[2]),则为else;
else w=wrap(wrap(a[1])a[2]);
打印$1,w}'词干
磨损((磨损:阀杆)离子:后缀)
弃权者((弃权:词干)弃权者:后缀)
弃权((弃权:茎)离子:后缀)
不准确(在:前缀(准确:词干))
非活动(在:前缀(活动:干))
不可通信(in:前缀((通信:干)可通信:后缀))
不兼容(in:前缀((兼容:茎)ity:后缀))
awk-F'|'-v OFS='{
$1 = $1 " " 

对于(i=2;iunhappiness(un:prefixhap:stemy:suffixness:suffix)。此代码没有生成所需的输出。它只检查前缀为^ab inc等的单词。我需要一个通用的one@Karun它不是“将军”解决方案,它只适用于您所讨论的示例。我没有阅读PDF文件,也不会这样做。您应该解释所讨论的要求,而不仅仅是附上/链接一份318页的文件。@Kent我很抱歉。我的目的是对主题进行概述是的。这应该是general@Guido如果一个单词有3个前缀和3.suffix@Karun这不是问题的一部分;不管怎样,一个单词有可能同时有多个前缀或后缀吗?(对不起,如果是的话,英语不是我的母语)。这个单词不会在你输入的不同行上,因为同一个词干有不同的后缀吗@ᴳᵁᴵᴰᴼ 是的。我可以举一些例子。请检查新添加的代码。这很好。反国家主义是一个词。你可以添加一个解释或理由吗?@C_B Well OP说,对于大型数据集,接受的答案很慢。这个版本没有这个问题。
awk -v d="<>" '{$2="("$2")"}
$1~/^ab/{sub(d,")",$2);$2="(" $2}
$1~/^ina/{sub(d,"(",$2);$2=$2")"}
$1~/^inc/{sub(d,"((",$2);sub(d,")",$2);$2=$2")"}7' file
abrasion ((abrade:stem)ion:suffix)
abstainer ((abstain:stem)er:suffix)
abstention ((abstain:stem)ion:suffix)
inaccurate (in:prefix(accurate:stem))
inactive (in:prefix(active:stem))
incommunicable (in:prefix((communicate:stem)able:suffix))
incompatibility (in:prefix((compatible:stem)ity:suffix))
awk 'BEGIN{FS=OFS="\n"}{
  a=gensub(/([a-zA-Z]*):stem/,"(\\1:stem)", "g");
  b=gensub(/(\([a-zA-Z]*:stem\))<>([a-zA-Z]*):suffix/,"(\\1\\2:suffix)", "g", a);
  c=gensub(/([a-zA-Z]*:prefix)<>(.*)/,"(\\1\\2)", "g", b);
  print c;}' testfile
awk 'BEGIN{FS=OFS="\n"}{
   a=gensub(/([a-zA-Z]*):stem/,"(\\1:stem)", "g");
   while ( a ~ /stem)<>.*:suffix/) {
     a=gensub(/(\([a-zA-Z]*:stem\).*?)<>([a-zA-Z]*):suffix/,"(\\1\\2:suffix)", "g", a);
   }
   while ( a ~ /<>/) {
     a=gensub(/([a-zA-Z]*?:prefix)<>(.*)/,"(\\1\\2)", "g", a);
   }
   print a;}' test
$ awk 'function wrap(v) {return "("v")"; }
      {n=split($2,a,"<>"); 
       if(n==3) w=wrap(a[1] wrap(wrap(a[2]) a[3])); 
       else if(a[1]~/:prefix/) w=wrap(a[1] wrap(a[2])); 
       else w=wrap(wrap(a[1]) a[2]);
       print $1, w}' stems

abrasion ((abrade:stem)ion:suffix)
abstainer ((abstain:stem)er:suffix)
abstention ((abstain:stem)ion:suffix)
inaccurate (in:prefix(accurate:stem))
inactive (in:prefix(active:stem))
incommunicable (in:prefix((communicate:stem)able:suffix))
incompatibility (in:prefix((compatible:stem)ity:suffix))
awk -F'<>| ' -v OFS= '{ 
    $1 = $1 " " 
    for (i=2; i<=NF; i++) { 
        if ($i ~ /prefix$/)    { $i = "(" $i; $NF = $NF ")" } 
        if ($i ~ /stem\)?$/)   { stem = i; $i = "(" $i ")" } 
        if ($i ~ /suffix\)?$/) { $i = $i ")"; $stem = "(" $stem } } 
    } { print }'