用Ruby编写iterpreter，无法正确使用空白

因此，我正在遵循“编写解释器书”，并在Ruby中而不是在Go中实现它。我可以扫描令牌，例如; =，+等，但是当我在其中有let，10等标识符时，它的行为似乎有所不同我的输入字符串。试图整整一周地寻找这个虫子，但徒劳无功，所以我认为也许可以用新的眼睛抓住它。

这里是概述。

代码库很小，大多数逻辑都位于lib/lexer/lexer.rb
类Lexer保持以下状态：输入字符串中当前字符的光标，输入字符串中下一个字符的光标和输入字符串中的当前字符
Lexer具有以下方法
- read_char将数据成员设置为适当的值
- read_indentifier用于提取属于字符串的所有字符，这些字符串不是保留关键字，而是标识符，并在返回之前调用read_char
- read_number与read_identifier相同，但用于数字
- consume_whitespace跳过空格，换行符等
- next_token用于将当前字符与适当的大小写匹配，并返回其在Token中定义的lib/token/token.rb对象，并调用read_char以在返回之前递增游标

require_relative '../token/token'

def is_letter(ch) #basically decides syntax acceptable for variable names
  #puts ch.class
  'a' <= ch && ch <= 'z' || 'A' <= ch && ch <= 'Z' || ch == '_'
end

def is_digit(ch) #checks if digit
  '0' <= ch && ch <= '9'
end

class Lexer

  def initialize(input)
    @input = input
    @position = 0
    @readPosition = 0
    @ch =''
    read_char
  end

  def read_char
    #puts caller[0]
    @ch = @readPosition >= @input.length ? '' : @input[@readPosition]
    @position = @readPosition
    @readPosition += 1
    #puts "INSIDE READ_CHAR  #{@position} #{@readPosition} #{@ch}"
  end



  # SUPPOSED TO BE A LOOP WAS JUST A CONDITION. NOW FIXED.
  def consume_whitespace
    while @ch == ' ' || @ch =='\t' || @ch == '\n' || @ch == '\r' do
      read_char
    end
  end

  def read_identifier
    pos = @position
    #puts "RI: char #{@ch}  pos #{pos} position #{@position}"
    while is_letter(@ch) do
      #puts @ch
      read_char
    end
    puts "METHOD read_identifier: char #{@ch}  pos #{pos} position #{@position}\n"
    @input[pos..@position-1]

  end

  def read_number
    pos = @position
    #puts "RN: char #{@ch}  pos #{pos} position #{@position}"
    while is_digit(@ch) do
      read_char
    end
    puts "METHOD read_number: char #{@ch}  pos #{pos} position #{@position}\n"
    @input[pos..@position-1]

  end


  def next_token
    #puts @ch,@ch.class
    #puts "\nX=X=X=X=X=X=X=X=X=:  #{@ch},#{@ch.ord},X=X=X=X=X=X=X=X=X=\n"
    tok = nil
    consume_whitespace

    tok =
      case @ch
      when '=' then Token.new(ASSIGN,@ch)
      when '+' then Token.new(PLUS,@ch)
      when '-' then Token.new(MINUS,@ch)
      when '/' then Token.new(DIVIDE,@ch)
      when '*' then Token.new(MULTIPLY,@ch)
      when '%' then Token.new(MODULO,@ch)
      #when '==' then Token.new(EQUAL_TO,@ch)
      when '>' then Token.new(GREATER_THAN,@ch)
      when '<' then Token.new(LESS_THAN,@ch)
      #when '!=' then Token.new(UNEQUAL_TO,@ch)
      #when '&&' then Token.new(AND,@ch)
      #when '||' then Token.new(OR,@ch)
      when '!' then Token.new(NOT,@ch)
      when ',' then Token.new(COMMA,@ch)
      when ';' then Token.new(SEMICOLON,@ch)
      when '?' then Token.new(QUESTION,@ch)
      when '(' then Token.new(LPAREN,@ch)
      when ')' then Token.new(RPAREN,@ch)
      when '[' then Token.new(LSQUARE,@ch)
      when ']' then Token.new(RSQUARE,@ch)
      when '{' then Token.new(LCURLY,@ch)
      when '}' then Token.new(RCURLY,@ch)
      else
        #puts 'hello from next_token',@ch.ord
        # STATE WAS BEING MUTATED NOW FIXED
        puts "letter #{@ch}"
        puts "letter ascii   #{@ch.ord}"
        #puts "isletter  "
        if is_letter(@ch)
          literal = read_identifier
          Token.new(look_up_ident(literal),literal)
        elsif is_digit(@ch)
          Token.new(INT,read_number)
        else
          Token.new(ILLEGAL,"ILLEGAL")
        end
      end
    read_char
    return tok
  end

end

现在rake测试失败对调试没有帮助，所以我决定只写一个main.rb脚本就可以做到这一点，该脚本将导入并运行我的词法分析器，并在整个过程中散布大量puts代码库

这是我的main.rb


    require_relative 'lib/lexer/lexer'

    lex = Lexer.new('five = 5;
                               ten = 10;')

    i = 1
    while i <= 8
        tok = lex.next_token
        puts "\nIN_MAIN: #{tok.type}  ==> #{tok.literal}\n\n"
        i=i+1
    end

这是ruby main.rb

的输出

     letter f
     letter ascii   102
    METHOD read_identifier: char    pos 0 position 4

    IN_MAIN: IDENTIFIER  ==> five


    IN_MAIN: =  ==> =

    letter 5
    letter ascii   53
    METHOD read_number: char ;  pos 7 position 8

    IN_MAIN: INT  ==> 5

    letter 
    letter ascii   10

    IN_MAIN: ILLEGAL  ==> ILLEGAL

    letter t
    letter ascii   116
    METHOD read_identifier: char    pos 27 position 30

    IN_MAIN: IDENTIFIER  ==> ten


    IN_MAIN: =  ==> =

    letter 1
    letter ascii   49
    METHOD read_number: char ;  pos 33 position 35

    IN_MAIN: INT  ==> 10

       letter 
        Traceback (most recent call last):
        2: from main.rb:8:in `<main>'
        1: from /home/palash25/gundoochy/lib/lexer/lexer.rb:89:in `next_token'
            /home/palash25/gundoochy/lib/lexer/lexer.rb:89:in `ord': empty string  (ArgumentError)

我们可以忽略最后一行，因为我现在无法处理如何为EOF返回对象，但这是在此之前发生的事情的要旨

lexer能够正确扫描令牌，直到five = 5之后，它会跳过下一个立即字符;并为此返回令牌对象，而返回{在ILLEGAL之后的\n的{1}}类型（我什至打印出字符的ascii值以确保它是;返回和非法）>

这本不应该发生的，因为consumer_whitespace应该跳过所有种类的空格，但是换行符仍然没有，因为我们能够扫描下一行\n，但是最后一个分号是在输出中无处可见，就像第一个

如果我使用不带任何标识符或数字的输入字符串，则效果很好。

这里是完整代码库https://gitlab.com/palash25/gundoochy

的链接

用Ruby编写iterpreter，无法正确使用空白

chouyangheng 回答：用Ruby编写iterpreter，无法正确使用空白

大家都在问