Java 如何解析输入文件

Java 如何解析输入文件,java,parsing,java.util.scanner,Java,Parsing,Java.util.scanner,所以我需要解析这个输入文件,但我似乎不知道该怎么做。我尝试过使用scanner.Delimiter(),但仍然有问题。有人知道怎么做吗 以下是输入文件中的一行: 200.88.223.98--[01/Feb/2007:04:02:22-0500]“GET/gallery/v/events/album02/Attracts/ProgrammingAttraction05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2和g2_fromNavId=x33

所以我需要解析这个输入文件,但我似乎不知道该怎么做。我尝试过使用
scanner.Delimiter()
,但仍然有问题。有人知道怎么做吗

以下是输入文件中的一行:

200.88.223.98--[01/Feb/2007:04:02:22-0500]“GET/gallery/v/events/album02/Attracts/ProgrammingAttraction05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2和g2_fromNavId=x332be852 HTTP/1.1”200 52464"http://cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album“Opera/6.01(Windows 98;U)[en]”

假设将其分成以下几个部分:

  • 地址=200.88.223.98

  • date=01/Feb/2007:04:02:22-0500

  • request=GET/gallery/v/events/album02/contestics/programmingcontestict05/?g2\u GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2\u fromNavId=x332be852 HTTP/1.1

  • status=200

  • bytes=52464

  • 参考=http://cs.tcnj.edu/gallery/main.php?
    g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2fCompetings%2fProgrammingCompeting05%2F%3Fg2_GALLERYSID%3D3BE9666F9C07E16B7F33EA8ACB8DD2&G2GallerySid=3BE9666F9F9F9C07E16B7F37F33EA8ACB8DD2&g2 GALLERYSID=3E8ACB8ACB8DD2&g2返回名=相册

  • agent=Opera/6.01(Windows 98;U)[en]

  • 下面是我试图解析它的代码部分:

    Scanner scan = new Scanner(input);
    scan.useDelimiter("[-']+");
    while (scan.hasNextLine()) 
    {
        String address = scan.next();
        String date = scan.next();
        String request = scan.next();
        int status = scan.nextInt();
        int bytes = scan.nextInt();
        String refer = scan.next();
        String agent = scan.next(); 
    }
    
    显示以下错误:

    线程“main”java.util.InputMismatchException中的异常 位于java.util.Scanner.throwFor(Scanner.java:840) 下一步(Scanner.java:1461) 位于java.util.Scanner.nextInt(Scanner.java:2091) 位于java.util.Scanner.nextInt(Scanner.java:2050) 在Analyzer.启动时(未知源) at Driver.main(未知源) Java结果:1
    想想这个。 按空间分割行并提取数据

    String s = "200.88.223.98 - - [01/Feb/2007:04:02:22 -0500] \"GET /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x332be852 HTTP/1.1\" 200 52464 \"http://cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album\" \"Opera/6.01 (Windows 98; U) [en]\"";
    
      String arr [] = s.split(" ");
    
      for(int i =0 ;i<arr.length;i++){
          System.out.println(i+" - "+arr[i]);
      }
    

    因此,第0个元素表示ip,第3个和第4个元素表示日期,第6个和第7个元素表示请求,这样您就可以提取数据。

    您遇到的实际问题是什么?数据是否总是遵循相同的模式?如果是这样,也许一个正则表达式就是答案。&每行数据可能不同。有些可能是不同的缺少这些字段中的一个或多个,每一行可能有不同的长度,等等,这可以用正则表达式以某种方式实现吗?当我试图提取数据时,我得到一个ArrayIndexOutOfBoundsException,我不知道为什么
    0 : 200.88.223.98
    1 : -
    2 : -
    3 : [01/Feb/2007:04:02:22
    4 : -0500]
    5 : "GET
    6 : /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x332be852
    7 : HTTP/1.1"
    8 : 200
    9 : 52464
    10 : "http://cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album"
      11 : "Opera/6.01
      12 : (Windows
      13 : 98;
      14 : U)
      15 : [en]"