换行符如何影响java中的System.in.read()
我正在尝试创建一个词法分析器类,它主要标记输入流字符,我使用System.in.read来读取字符。doc说,当到达流的末尾时,它返回-1,但是,当它有不同的输入时,这种行为有何不同,我无法理解这一点。例如,delete.txt具有以下输入:换行符如何影响java中的System.in.read(),java,Java,我正在尝试创建一个词法分析器类,它主要标记输入流字符,我使用System.in.read来读取字符。doc说,当到达流的末尾时,它返回-1,但是,当它有不同的输入时,这种行为有何不同,我无法理解这一点。例如,delete.txt具有以下输入: 1. I have 2. bulldoz//er 然后Lexer具有正确的标记化,如下所示: [I=257, have=257, false=259, er=257, bulldoz=257, true=258] 但是现在,如果我使用enter t
1. I have
2. bulldoz//er
然后Lexer具有正确的标记化,如下所示:
[I=257, have=257, false=259, er=257, bulldoz=257, true=258]
但是现在,如果我使用enter then插入一些空行,代码将进入无限循环,代码将检查输入的换行符和空格,但是,它是如何被绕过的
1. I have
2. bulldoz//er
3.
完整代码为:
package lexer;
import java.io.*;
import java.util.*;
import lexer.Token;
import lexer.Num;
import lexer.Tag;
import lexer.Word;
class Lexer{
public int line = 1;
private char null_init = ' ';
private char tab = '\t';
private char newline = '\n';
private char peek = null_init;
private char comment1 = '/';
private char comment2 = '*';
private Hashtable<String, Word> words = new Hashtable<>();
//no-args constructor
public Lexer(){
reserve(new Word(Tag.TRUE, "true"));
reserve(new Word(Tag.FALSE, "false"));
}
void reserve(Word word_obj){
words.put(word_obj.lexeme, word_obj);
}
char read_buf_char() throws IOException {
char x = (char)System.in.read();
return x;
}
/*tokenization done here*/
public Token scan()throws IOException{
for(; ; ){
// while exiting the loop, sometime the comment
// characters are read e.g. in bulldoz//er,
// which is lost if the buffer is read;
// so read the buffer i
peek = read_buf_char();
if(peek == null_init||peek == tab){
peek = read_buf_char();
System.out.println("space is read");
}else if(peek==newline){
peek = read_buf_char();
line +=1;
}
else{
break;
}
}
if(Character.isDigit(peek)){
int v = 0;
do{
v = 10*v+Character.digit(peek, 10);
peek = read_buf_char();
}while(Character.isDigit(peek));
return new Num(v);
}
if(Character.isLetter(peek)){
StringBuffer b = new StringBuffer(32);
do{
b.append(peek);
peek = read_buf_char();
}while(Character.isLetterOrDigit(peek));
String buffer_string = b.toString();
Word reserved_word = (Word)words.get(buffer_string);//returns null if not found
if(reserved_word != null){
return reserved_word;
}
reserved_word = new Word(Tag.ID, buffer_string);
// put key value pair in words hashtble
words.put(buffer_string, reserved_word);
return reserved_word;
}
// if character read is not a digit or a letter,
// then the character read is a new token
Token t = new Token(peek);
peek = ' ';
return t;
}
private char get_peek(){
return (char)this.peek;
}
private boolean reached_buf_end(){
// reached end of buffer
if(this.get_peek() == (char)-1){
return true;
}
return false;
}
public void run_test()throws IOException{
//loop checking variable
//a token object is initialized with dummy value
Token new_token = null;
// while end of stream has not been reached
while(this.get_peek() != (char)-1){
new_token = this.scan();
}
System.out.println(words.entrySet());
}
public static void main(String[] args)throws IOException{
Lexer tokenize = new Lexer();
tokenize.run_test();
}
}
get_peek函数获取具有当前输入缓冲区字符的peek值。
在run_测试函数中检查是否到达缓冲区末端。
主要处理在扫描功能中完成
我使用以下命令:cat delete.txt | javalexer/lexer将该文件作为已编译java类的输入。请告诉我,添加了换行符的输入文件的代码是如何进行无限循环的?我不确定您是如何检查流-1的结尾的。在扫描结束时,你将peek分配给空格,我认为这是一个混乱,当你有一个空行时,你无法捕捉到-1