Ruby中的ArnoldC lexer_Ruby_Lexer - Fatal编程技术网

Ruby中的ArnoldC lexer

ruby

Ruby中的ArnoldC lexer,ruby,lexer,Ruby,Lexer,我正在尝试用Ruby为ArnoldC（）编写一个简单的lexer 我希望一个方法可以定义为：仔细听我说我的方法和手说“你好，世界！” 哈斯塔·拉维斯塔，宝贝我有以下代码： class ArnoldLexer KEYWORDS = ["LISTEN TO ME VERY CAREFULLY", "TALK TO THE HAND", "HASTA LA VISTA, BABY"] def tokenize(code) # Cleanup code by removing extr

我正在尝试用Ruby为ArnoldC（）编写一个简单的lexer

我希望一个方法可以定义为：

仔细听我说我的方法
和手说“你好，世界！”
哈斯塔·拉维斯塔，宝贝

我有以下代码：

class ArnoldLexer

KEYWORDS = ["LISTEN TO ME VERY CAREFULLY", "TALK TO THE HAND", "HASTA LA VISTA, BABY"]

def tokenize(code)
    # Cleanup code by removing extra line breaks
    code.chomp!

    # Current character postion
    i = 0

    # Collection of all parsed tokens in the form [:TOKEN_TYPE, value]
    tokens = []

    # Implement a very simple scanner.
    # Scan one character at a time until there is something to parse.
    while i < code.size
        chunk = code[i..-1]

        # Matching standard tokens.
        if identifier = chunk[/\A([A-Z\s\,]*)/, 1]

            # Keywords are special identifiers tagged with their own name, 
            # 'if' will result in an [:IF, "if"] token.
            if KEYWORDS.include?(identifier)
                tokens << [identifier.upcase.to_sym, identifier]

            # Skip what was just parsed.
            i += identifier.size
            end

        elsif identifier = chunk[/\A([a-z]*)/, 1]
            tokens << [:IDENTIFIER, identifier]
            i += identifier.size

            # Matching class names and constants starting with a capital letter.
        elsif constant = chunk[/\A([A-Z]\w*)/, 1]
            tokens << [:CONSTANT, constant]
            i += constant.size

        elsif newline = chunk[/\A\n/, 1]
            tokens << [:NEWLINE, "\n"]
                    elsif number = chunk[/\A([0-9]+)/, 1]
            tokens << [:NUMBER, number.to_i]
            i += number.size

        elsif string = chunk[/\A"(.*?)"/, 1]
            tokens << [:STRING, string]
            i += string.size + 2
        end
    end
    tokens
end

class-ArnoldLexer
关键词=[“仔细听我说”，“和手说话”，“哈斯塔拉维斯塔，宝贝”]
def标记化（代码）
#通过删除额外的换行符来清理代码
代码，chomp！
#当前角色位置
i=0
#以[：TOKEN\u TYPE，value]形式的所有已解析令牌的集合
代币=[]
#实现一个非常简单的扫描仪。
#一次扫描一个字符，直到有东西需要解析。
而我