Java 如何使用分隔符分析文本?

Java 如何使用分隔符分析文本?,java,Java,可能重复: 我想解析以下数据,以便获得下面指定的输出 输入: RTRV-ALM-EQPT::ALL:RA01; SIMULATOR 09-11-20 13:52:15 M RA01 COMPLD "SLOT-1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\"Fan-T\"," "SLOT-1-1-1,CMP:MJ,T-BATTERYPWR-2-LOW,NSA,01-10-09,00-00-00,,:\"B

可能重复:

我想解析以下数据,以便获得下面指定的输出

输入:

RTRV-ALM-EQPT::ALL:RA01; SIMULATOR 09-11-20 13:52:15 M RA01 COMPLD "SLOT-1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\"Fan-T\"," "SLOT-1-1-1,CMP:MJ,T-BATTERYPWR-2-LOW,NSA,01-10-09,00-00-00,,:\"Battery-T\"," "SLOT-1-1-2,CMP:CR,PROC_FAIL,SA,09-11-20,13-51-55,,:\"Processor Failure\"," "SLOT-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA, 01-10-07,13-21-03,,:\"Laser-T\"," "SLOT-1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA, 01-10-02,21-32-11,,:\" Laser-T\"," "SLOT-1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\"Laser-T\"," "SLOT-1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\"Laser-T\"," ; RTRV-ALM-EQPT::全部:RA01; 模拟机09-11-20 13:52:15 M RA01 COMPLD “插槽1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\“Fan-T\”,” “插槽-1-1-1,CMP:MJ,T-电池PWR-2-低,NSA,01-10-09,00-00-00,,:\“电池-T\”,” “插槽1-1-2,CMP:CR,程序失败,SA,09-11-20,13-51-55,,:“处理器故障” “插槽-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-07,13-21-03,,:\“激光-T\”,” “插槽1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-02,21-32-11,,:”激光-T“ “插槽1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\“激光-T\”,” “插槽1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\“激光-T\”,” ; 输出:

1) RTRV-ALM-EQPT::ALL:RA01; 2) SIMULATOR 3) 09-11-20 4) 13:52:15 5) M 6) RA01 7) COMPLD 8) "SLOT-1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\"Fan-T\"," 9) "SLOT-1-1-1,CMP:MJ,T-BATTERYPWR-2-LOW,NSA,01-10-09,00-00-00,,:\"Battery-T\"," 10) "SLOT-1-1-2,CMP:CR,PROC_FAIL,SA,09-11-20,13-51-55,,:\"Processor Failure\"," 11) "SLOT-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA, 01-10-07,13-21-03,,:\"Laser-T\"," 12) "SLOT-1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA, 01-10-02,21-32-11,,:\" Laser-T\"," 13) "SLOT-1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\"Laser-T\"," 14) "SLOT-1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\"Laser-T\"," 1) RTRV-ALM-EQPT::全部:RA01; 2) 模拟机 3) 09-11-20 4) 13:52:15 5) M 6) RA01 7) 康普尔德 8) “插槽1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\“Fan-T\”,” 9) “插槽-1-1-1,CMP:MJ,T-电池PWR-2-低,NSA,01-10-09,00-00-00,,:\“电池-T\”,” 10) “插槽1-1-2,CMP:CR,程序失败,SA,09-11-20,13-51-55,,:“处理器故障” 11) “插槽-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-07,13-21-03,,:\“激光-T\”,” 12) “插槽1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-02,21-32-11,,:”激光-T“ 13) “插槽1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\“激光-T\”,” 14) “插槽1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\“激光-T\”,”
要解析任何输入,必须了解其结构

  • 前四行是否始终存在
  • 这四行的格式是什么

  • 最好的方法可能是不要考虑将第一个文本转换为第二个文本

    相反,可以考虑首先将第一个文本解析为一组表示它们实际是什么的Java对象。例如,输入的第二行/第三行可能由带有“区域”、“日期”和“时间”属性的
    Test
    类表示。(只有你才能根据你对一切含义的了解,提出一个合理的模型)


    然后,一旦获得了对文件信息的良好的内存表示,就可以考虑在第二种情况下打印到文本。现在只打印Java对象中的各种字段和属性应该很容易,而不是试图动态转换输入文本。

    假设文件相对较小,因此可以读入内存。试着这样做:

    public class Main { 
        public static void main(String[] args) {
            String text = "RTRV-ALM-EQPT::ALL:RA01;\n"+
                "\n"+
                "   SIMULATOR 09-11-20 13:52:15\n"+
                "M  RA01 COMPLD\n"+
                "   \"SLOT-1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\\\"Fan-T\\\",\"\n"+
                "   \"SLOT-1-1-1,CMP:MJ,T-BATTERYPWR-2-LOW,NSA,01-10-09,00-00-00,,:\\\"Battery-T\\\",\"\n"+
                "   \"SLOT-1-1-2,CMP:CR,PROC_FAIL,SA,09-11-20,13-51-55,,:\\\"Processor Failure\\\",\"\n"+
                "   \"SLOT-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA, 01-10-07,13-21-03,,:\\\"Laser-T\\\",\"\n"+
                "   \"SLOT-1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA, 01-10-02,21-32-11,,:\\\" Laser-T\\\",\"\n"+
                "   \"SLOT-1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\\\"Laser-T\\\",\"\n"+
                "   \"SLOT-1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\\\"Laser-T\\\",\"\n"+
                ";";
            Matcher m = Pattern.compile("\"(?:\\\\.|[^\\\"])*\"|\\S+").matcher(text);
            int n = 0;
            while(m.find()) {
                System.out.println((++n)+") "+m.group());
            }
        }
    }
    
    "(?:\\.|[^\\"])*"|\S+
    
    输出:

    1) RTRV-ALM-EQPT::ALL:RA01;
    2) SIMULATOR
    3) 09-11-20
    4) 13:52:15
    5) M
    6) RA01
    7) COMPLD
    8) "SLOT-1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\"Fan-T\","
    9) "SLOT-1-1-1,CMP:MJ,T-BATTERYPWR-2-LOW,NSA,01-10-09,00-00-00,,:\"Battery-T\","
    10) "SLOT-1-1-2,CMP:CR,PROC_FAIL,SA,09-11-20,13-51-55,,:\"Processor Failure\","
    11) "SLOT-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA, 01-10-07,13-21-03,,:\"Laser-T\","
    12) "SLOT-1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA, 01-10-02,21-32-11,,:\" Laser-T\","
    13) "SLOT-1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\"Laser-T\","
    14) "SLOT-1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\"Laser-T\","
    15) ;
    
    唯一的区别是有第15个匹配项:
    ,我相信你忘了

    原始正则表达式(没有所有转义)如下所示:

    public class Main { 
        public static void main(String[] args) {
            String text = "RTRV-ALM-EQPT::ALL:RA01;\n"+
                "\n"+
                "   SIMULATOR 09-11-20 13:52:15\n"+
                "M  RA01 COMPLD\n"+
                "   \"SLOT-1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\\\"Fan-T\\\",\"\n"+
                "   \"SLOT-1-1-1,CMP:MJ,T-BATTERYPWR-2-LOW,NSA,01-10-09,00-00-00,,:\\\"Battery-T\\\",\"\n"+
                "   \"SLOT-1-1-2,CMP:CR,PROC_FAIL,SA,09-11-20,13-51-55,,:\\\"Processor Failure\\\",\"\n"+
                "   \"SLOT-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA, 01-10-07,13-21-03,,:\\\"Laser-T\\\",\"\n"+
                "   \"SLOT-1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA, 01-10-02,21-32-11,,:\\\" Laser-T\\\",\"\n"+
                "   \"SLOT-1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\\\"Laser-T\\\",\"\n"+
                "   \"SLOT-1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\\\"Laser-T\\\",\"\n"+
                ";";
            Matcher m = Pattern.compile("\"(?:\\\\.|[^\\\"])*\"|\\S+").matcher(text);
            int n = 0;
            while(m.find()) {
                System.out.println((++n)+") "+m.group());
            }
        }
    }
    
    "(?:\\.|[^\\"])*"|\S+
    
    和匹配项:

    "          # match a double quote
    (?:        # open non matching group 1
      \\.      #   match a backslash followed by any char (except line breaks)
      |        #   OR
      [^\\"]   #   match any char except a backslash and a double quote
    )*         # close non matching group 1 and repeat it zero or more times
    "          # match a double quote
    |          # OR
    \S+        # match one or more characters other than white space chars
    

    换句话说:匹配一个带引号的字符串或匹配一个仅由非空格字符组成的单词。

    因此,除了引号中可能包含带引号的空格外,您还可以使用空格?我想这是第三次问这个问题了。这是家庭作业还是什么?