Eclipse 如何使用antlr显示句子中的所有代词及其人称_Eclipse_Parsing_Antlr_Antlr3

Eclipse 如何使用antlr显示句子中的所有代词及其人称

eclipse parsing antlr

Eclipse 如何使用antlr显示句子中的所有代词及其人称,eclipse,parsing,antlr,antlr3,Eclipse,Parsing,Antlr,Antlr3,根据WayneH的语法编辑这是我语法文件里的东西 grammar pfinder; options { language = Java; } sentence : ((words | pronoun) SPACE)* ((words | pronoun) ('.' | '?')) ; words : WORDS {System.out.println($text);}; pronoun returns [String value] : sfir

根据WayneH的语法编辑

这是我语法文件里的东西

grammar pfinder;

options {
  language = Java;
}
sentence
    : ((words | pronoun) SPACE)* ((words | pronoun) ('.' | '?'))
    ;

words 
    :   WORDS {System.out.println($text);};

pronoun returns [String value] 
    : sfirst {$value = $sfirst.value; System.out.println($sfirst.text + '(' + $sfirst.value + ')');}
    | ssecond {$value = $ssecond.value; System.out.println($ssecond.text + '(' + $ssecond.value + ')');}
    | sthird {$value = $sthird.value; System.out.println($sthird.text + '(' + $sthird.value + ')');}
    | pfirst {$value = $pfirst.value; System.out.println($pfirst.text + '(' + $pfirst.value + ')');}
    | psecond {$value = $psecond.value; System.out.println($psecond.text + '(' + $psecond.value + ')');}
    | pthird{$value = $pthird.value; System.out.println($pthird.text + '(' + $pthird.value + ')');};

sfirst returns [String value] :  ('i'   | 'me'  | 'my'   | 'mine') {$value = "s1";};
ssecond returns [String value] : ('you' | 'your'| 'yours'| 'yourself') {$value = "s2";};
sthird returns [String value] :  ('he'  | 'she' | 'it'   | 'his' | 'hers' | 'its' | 'him' | 'her' | 'himself' | 'herself') {$value = "s3";};
pfirst returns [String value] :  ('we'  | 'us'  | 'our'  | 'ours') {$value = "p1";};
psecond returns [String value] : ('yourselves') {$value = "p2";};
pthird returns [String value] :  ('they'| 'them'| 'their'| 'theirs' | 'themselves') {$value = "p3";};

WORDS : LETTER*;// {$channel=HIDDEN;}; 
SPACE : (' ')?;
fragment LETTER :  ('a'..'z' | 'A'..'Z');

下面是关于java测试类的内容

import java.util.Scanner;
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import java.util.List;

public class test2 {
    public static void main(String[] args) throws RecognitionException {
        String s;
        Scanner input = new Scanner(System.in);
        System.out.println("Eter a Sentence: ");
        s=input.nextLine().toLowerCase();
        ANTLRStringStream in = new ANTLRStringStream(s);
        pfinderLexer lexer = new pfinderLexer(in);
        TokenStream tokenStream = new CommonTokenStream(lexer);
        pfinderParser parser = new pfinderParser(tokenStream); 
        parser.pronoun(); 
    }
}

我需要在测试文件中放入什么，以便它显示一个句子中的所有代词及其各自的值（s1，s2，…）

片段不会创建标记，并且将它们放在解析器规则中不会给出理想的结果

在我的测试箱上，这产生了（我想！）期望的结果：

program :
        PRONOUN+
    ;

PRONOUN :
        'i'   | 'me'  | 'my'   | 'mine'
    |   'you' | 'your'| 'yours'| 'yourself'
    |   'he'  | 'she' | 'it'   | 'his' | 'hers' | 'its' | 'him' | 'her' | 'himself' | 'herself'
    |   'we'  | 'us'  | 'our'  | 'ours'
    |   'yourselves'
    |   'they'| 'them'| 'their'| 'theirs' | 'themselves'
    ;

WS  :   ' ' { $channel = HIDDEN; };

WORD    :   ('A'..'Z'|'a'..'z')+ { $channel = HIDDEN; };

在AntlWorks中，一个示例“我踢了你”返回了树结构：

program->[i，you]

我觉得有必要指出，Antlr从句子中去掉代词太过分了。考虑使用正则表达式。此语法不区分大小写。将单词扩展到除代词词典（如puncuation等）之外的所有内容可能有点乏味。需要对输入进行消毒

---编辑：响应第二个OP：

我修改了原始语法，以便于分析。新语法是：

grammar pfinder;

options {
    backtrack=true;
    output = AST;
}

tokens {
    PROGRAM;
}

program :
        (WORD* p+=PRONOUN+ WORD*)*
        -> ^(PROGRAM $p*)
    ;


PRONOUN :
        'i'   | 'me'  | 'my'   | 'mine'
    |   'you' | 'your'| 'yours'| 'yourself'
    |   'he'  | 'she' | 'it'   | 'his' | 'hers' | 'its' | 'him' | 'her' | 'himself' | 'herself'
    |   'we'  | 'us'  | 'our'  | 'ours' | 'yourselves'
    |   'they'| 'them'| 'their'| 'theirs' | 'themselves'
;

WS  :   ' ' { $channel = HIDDEN; };

WORD    :   ('A'..'Z'|'a'..'z')+;

我将解释这些变化：

现在需要回溯来解决解析器规则程序。也许有一个更好的方式来写它，不需要回溯，但这是第一件突然出现在我脑海中的事情
一个虚构的代词程序已经被定义来对代词进行分组
每个匹配的程序都被添加到Antlr var$p中，并在AST中根据假想规则重写
解释器代码现在可以使用CommonTree来收集匹配的代词

下面是用C#（我不懂Java）编写的，但我编写它的目的是让您能够阅读和理解它

static object[] ReadTokens( string text )
{
    ArrayList results = new ArrayList();
    pfinderLexer Lexer = new pfinderLexer(new Antlr.Runtime.ANTLRStringStream(text));
    pfinderParser Parser = new pfinderParser(new CommonTokenStream(Lexer));
    // syntaxTree is imaginary token {PROGRAM},
    // its children are the pronouns collected by $p in grammar.
    CommonTree syntaxTree = Parser.program().Tree as CommonTree;
    if ( syntaxTree == null ) return null;
    foreach ( object pronoun in syntaxTree.Children )
    {
        results.Add(pronoun.ToString());
    }
    return results.ToArray();
}

调用ReadTokens（“我踢了你和他们”）返回数组[“我”、“你”、“他们”]

grammar pfinder;

options {
  language = Java;
}
sentence
    : ((words | pronoun) SPACE)* ((words | pronoun) ('.' | '?'))
    ;

words 
    :   WORDS {System.out.println($text);};

pronoun returns [String value] 
    : sfirst {$value = $sfirst.value; System.out.println($sfirst.text + '(' + $sfirst.value + ')');}
    | ssecond {$value = $ssecond.value; System.out.println($ssecond.text + '(' + $ssecond.value + ')');}
    | sthird {$value = $sthird.value; System.out.println($sthird.text + '(' + $sthird.value + ')');}
    | pfirst {$value = $pfirst.value; System.out.println($pfirst.text + '(' + $pfirst.value + ')');}
    | psecond {$value = $psecond.value; System.out.println($psecond.text + '(' + $psecond.value + ')');}
    | pthird{$value = $pthird.value; System.out.println($pthird.text + '(' + $pthird.value + ')');};

//s returns [String value]
//    :  exp=sfirst  {$value = "s1";}
//    |  exp=ssecond {$value = "s2";}
//    |  exp=sthird  {$value = "s3";}
//    |  exp=pfirst  {$value = "p1";}
//    |  exp=psecond {$value = "p2";}
//    |  exp=pthird  {$value = "p3";}
//    ;

sfirst returns [String value] :  ('i'   | 'me'  | 'my'   | 'mine') {$value = "s1";};
ssecond returns [String value] : ('you' | 'your'| 'yours'| 'yourself') {$value = "s2";};
sthird returns [String value] :  ('he'  | 'she' | 'it'   | 'his' | 'hers' | 'its' | 'him' | 'her' | 'himself' | 'herself') {$value = "s3";};
pfirst returns [String value] :  ('we'  | 'us'  | 'our'  | 'ours') {$value = "p1";};
psecond returns [String value] : ('yourselves') {$value = "p2";};
pthird returns [String value] :  ('they'| 'them'| 'their'| 'theirs' | 'themselves') {$value = "p3";};

WORDS : LETTER*;// {$channel=HIDDEN;}; 
SPACE : (' ')?;
fragment LETTER :  ('a'..'z' | 'A'..'Z');