Java 如何使用分隔符分析文本?
可能重复:Java 如何使用分隔符分析文本?,java,Java,可能重复: 我想解析以下数据,以便获得下面指定的输出 输入: RTRV-ALM-EQPT::ALL:RA01; SIMULATOR 09-11-20 13:52:15 M RA01 COMPLD "SLOT-1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\"Fan-T\"," "SLOT-1-1-1,CMP:MJ,T-BATTERYPWR-2-LOW,NSA,01-10-09,00-00-00,,:\"B
我想解析以下数据,以便获得下面指定的输出 输入: RTRV-ALM-EQPT::ALL:RA01; SIMULATOR 09-11-20 13:52:15 M RA01 COMPLD "SLOT-1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\"Fan-T\"," "SLOT-1-1-1,CMP:MJ,T-BATTERYPWR-2-LOW,NSA,01-10-09,00-00-00,,:\"Battery-T\"," "SLOT-1-1-2,CMP:CR,PROC_FAIL,SA,09-11-20,13-51-55,,:\"Processor Failure\"," "SLOT-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA, 01-10-07,13-21-03,,:\"Laser-T\"," "SLOT-1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA, 01-10-02,21-32-11,,:\" Laser-T\"," "SLOT-1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\"Laser-T\"," "SLOT-1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\"Laser-T\"," ; RTRV-ALM-EQPT::全部:RA01; 模拟机09-11-20 13:52:15 M RA01 COMPLD “插槽1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\“Fan-T\”,” “插槽-1-1-1,CMP:MJ,T-电池PWR-2-低,NSA,01-10-09,00-00-00,,:\“电池-T\”,” “插槽1-1-2,CMP:CR,程序失败,SA,09-11-20,13-51-55,,:“处理器故障” “插槽-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-07,13-21-03,,:\“激光-T\”,” “插槽1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-02,21-32-11,,:”激光-T“ “插槽1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\“激光-T\”,” “插槽1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\“激光-T\”,” ; 输出: 1) RTRV-ALM-EQPT::ALL:RA01; 2) SIMULATOR 3) 09-11-20 4) 13:52:15 5) M 6) RA01 7) COMPLD 8) "SLOT-1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\"Fan-T\"," 9) "SLOT-1-1-1,CMP:MJ,T-BATTERYPWR-2-LOW,NSA,01-10-09,00-00-00,,:\"Battery-T\"," 10) "SLOT-1-1-2,CMP:CR,PROC_FAIL,SA,09-11-20,13-51-55,,:\"Processor Failure\"," 11) "SLOT-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA, 01-10-07,13-21-03,,:\"Laser-T\"," 12) "SLOT-1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA, 01-10-02,21-32-11,,:\" Laser-T\"," 13) "SLOT-1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\"Laser-T\"," 14) "SLOT-1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\"Laser-T\"," 1) RTRV-ALM-EQPT::全部:RA01; 2) 模拟机 3) 09-11-20 4) 13:52:15 5) M 6) RA01 7) 康普尔德 8) “插槽1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\“Fan-T\”,” 9) “插槽-1-1-1,CMP:MJ,T-电池PWR-2-低,NSA,01-10-09,00-00-00,,:\“电池-T\”,” 10) “插槽1-1-2,CMP:CR,程序失败,SA,09-11-20,13-51-55,,:“处理器故障” 11) “插槽-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-07,13-21-03,,:\“激光-T\”,” 12) “插槽1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-02,21-32-11,,:”激光-T“ 13) “插槽1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\“激光-T\”,” 14) “插槽1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\“激光-T\”,”
要解析任何输入,必须了解其结构
最好的方法可能是不要考虑将第一个文本转换为第二个文本 相反,可以考虑首先将第一个文本解析为一组表示它们实际是什么的Java对象。例如,输入的第二行/第三行可能由带有“区域”、“日期”和“时间”属性的
Test
类表示。(只有你才能根据你对一切含义的了解,提出一个合理的模型)
然后,一旦获得了对文件信息的良好的内存表示,就可以考虑在第二种情况下打印到文本。现在只打印Java对象中的各种字段和属性应该很容易,而不是试图动态转换输入文本。
假设文件相对较小,因此可以读入内存。试着这样做:public class Main {
public static void main(String[] args) {
String text = "RTRV-ALM-EQPT::ALL:RA01;\n"+
"\n"+
" SIMULATOR 09-11-20 13:52:15\n"+
"M RA01 COMPLD\n"+
" \"SLOT-1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\\\"Fan-T\\\",\"\n"+
" \"SLOT-1-1-1,CMP:MJ,T-BATTERYPWR-2-LOW,NSA,01-10-09,00-00-00,,:\\\"Battery-T\\\",\"\n"+
" \"SLOT-1-1-2,CMP:CR,PROC_FAIL,SA,09-11-20,13-51-55,,:\\\"Processor Failure\\\",\"\n"+
" \"SLOT-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA, 01-10-07,13-21-03,,:\\\"Laser-T\\\",\"\n"+
" \"SLOT-1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA, 01-10-02,21-32-11,,:\\\" Laser-T\\\",\"\n"+
" \"SLOT-1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\\\"Laser-T\\\",\"\n"+
" \"SLOT-1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\\\"Laser-T\\\",\"\n"+
";";
Matcher m = Pattern.compile("\"(?:\\\\.|[^\\\"])*\"|\\S+").matcher(text);
int n = 0;
while(m.find()) {
System.out.println((++n)+") "+m.group());
}
}
}
"(?:\\.|[^\\"])*"|\S+
输出:
1) RTRV-ALM-EQPT::ALL:RA01;
2) SIMULATOR
3) 09-11-20
4) 13:52:15
5) M
6) RA01
7) COMPLD
8) "SLOT-1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\"Fan-T\","
9) "SLOT-1-1-1,CMP:MJ,T-BATTERYPWR-2-LOW,NSA,01-10-09,00-00-00,,:\"Battery-T\","
10) "SLOT-1-1-2,CMP:CR,PROC_FAIL,SA,09-11-20,13-51-55,,:\"Processor Failure\","
11) "SLOT-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA, 01-10-07,13-21-03,,:\"Laser-T\","
12) "SLOT-1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA, 01-10-02,21-32-11,,:\" Laser-T\","
13) "SLOT-1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\"Laser-T\","
14) "SLOT-1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\"Laser-T\","
15) ;
唯一的区别是有第15个匹配项:代码>,我相信你忘了
原始正则表达式(没有所有转义)如下所示:
public class Main {
public static void main(String[] args) {
String text = "RTRV-ALM-EQPT::ALL:RA01;\n"+
"\n"+
" SIMULATOR 09-11-20 13:52:15\n"+
"M RA01 COMPLD\n"+
" \"SLOT-1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\\\"Fan-T\\\",\"\n"+
" \"SLOT-1-1-1,CMP:MJ,T-BATTERYPWR-2-LOW,NSA,01-10-09,00-00-00,,:\\\"Battery-T\\\",\"\n"+
" \"SLOT-1-1-2,CMP:CR,PROC_FAIL,SA,09-11-20,13-51-55,,:\\\"Processor Failure\\\",\"\n"+
" \"SLOT-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA, 01-10-07,13-21-03,,:\\\"Laser-T\\\",\"\n"+
" \"SLOT-1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA, 01-10-02,21-32-11,,:\\\" Laser-T\\\",\"\n"+
" \"SLOT-1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\\\"Laser-T\\\",\"\n"+
" \"SLOT-1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\\\"Laser-T\\\",\"\n"+
";";
Matcher m = Pattern.compile("\"(?:\\\\.|[^\\\"])*\"|\\S+").matcher(text);
int n = 0;
while(m.find()) {
System.out.println((++n)+") "+m.group());
}
}
}
"(?:\\.|[^\\"])*"|\S+
和匹配项:
" # match a double quote
(?: # open non matching group 1
\\. # match a backslash followed by any char (except line breaks)
| # OR
[^\\"] # match any char except a backslash and a double quote
)* # close non matching group 1 and repeat it zero or more times
" # match a double quote
| # OR
\S+ # match one or more characters other than white space chars
换句话说:匹配一个带引号的字符串或匹配一个仅由非空格字符组成的单词。因此,除了引号中可能包含带引号的空格外,您还可以使用空格?我想这是第三次问这个问题了。这是家庭作业还是什么?