优化java模式匹配函数以按不同顺序打印,可能是linkedlist操作
此时,我的代码如下所示。这很简单,它只是读入一个数据文件,提取出所有感兴趣的位并打印出来。问题是,它打印出来的方式是错误的,顺序是错误的优化java模式匹配函数以按不同顺序打印,可能是linkedlist操作,java,linked-list,pattern-matching,Java,Linked List,Pattern Matching,此时,我的代码如下所示。这很简单,它只是读入一个数据文件,提取出所有感兴趣的位并打印出来。问题是,它打印出来的方式是错误的,顺序是错误的 import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; import java.util.regex.Matcher; import java.util.regex.Pattern; public class text_processing {
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class text_processing
{
@SuppressWarnings("resource")
public static void main(String[] args) throws IOException
{
String text;
BufferedReader br = new BufferedReader(new FileReader("/home/matthias/Workbench/SUTD/1_February/brute_force/items.csv"));
while ((text = br.readLine()) != null)
{
//the main character
Pattern pat_0 = Pattern.compile( "『(.*?)』" );
Matcher mat_0 = pat_0.matcher( text );
if( mat_0.find() )
{
System.out.println( mat_0.group(1) );
}
//the pin yin
Pattern pat_1 = Pattern.compile("class=\"\"pinyin\"\">(.*?)<script>(?:(?!<script>).)*");
Matcher mat_1 = pat_1.matcher( text );
if( mat_1.find() )
{
System.out.println( mat_1.group(1) );
}
//the ubiquitous radical
Pattern pat_2 = Pattern.compile( "<span class=\"\"b\"\">部首:</span>" );
Matcher mat_2 = pat_2.matcher( text );
if( mat_2.find() )
{
Pattern pat_3 = Pattern.compile("<span class=\"\"b\"\">部首:</span>(.*?)<span class=\"\"b\"\">");
Matcher mat_3 = pat_3.matcher( text );
if( mat_3.find() )
{
System.out.println("部首:" + mat_3.group(1) );
}
//stroke count
Pattern pat_4 = Pattern.compile(mat_3.group(1) + "<span class=\"\"b\"\">部首笔画:</span>(.*?)<span class=\"\"b\"\">");
Matcher mat_4 = pat_4.matcher( text );
if( mat_4.find() )
{
System.out.println("笔画:" + mat_4.group(1) );
}
}
else
{
//simple rad
Pattern pat_5 = Pattern.compile("简体部首:</span>(.*?)<span class=\"\"b\"\">");
Matcher mat_5 = pat_5.matcher( text );
if( mat_5.find() )
{
System.out.println("简体部首:" + mat_5.group(1) );
//stroke count
Pattern pat_6 = Pattern.compile(mat_5.group(1) + "<span class=\"\"b\"\">部首笔画:</span>(.*?)<span class=\"\"b\"\">");
Matcher mat_6 = pat_6.matcher( text );
if( mat_6.find() )
{
System.out.println("简体笔画:" + mat_6.group(1) );
}
}
//trad rad
Pattern pat_7 = Pattern.compile("繁体部首:</span>(.*?)<span class=\"\"b\"\">");
Matcher mat_7 = pat_7.matcher( text );
if( mat_7.find() )
{
System.out.println("繁体部首:" + mat_7.group(1) );
//stroke count
Pattern pat_8 = Pattern.compile(mat_7.group(1) + "<span class=\"\"b\"\">部首笔画:</span>(.*?)<span class=\"\"b\"\">");
Matcher mat_8 = pat_8.matcher( text );
if( mat_8.find() )
{
System.out.println("繁体笔画:" + mat_8.group(1) );
}
}
}
//the decomposition
Pattern pat_9 = Pattern.compile("#################,\" ]:(.*?)\\(");
Matcher mat_9 = pat_9.matcher( text );
if( mat_9.find() )
{
System.out.println("首尾分解: " + mat_9.group(1) );
}
}
}
}
我希望它看起来是:
卥
xī
首尾分解: 占乂
简体部首:丨
简体笔画:1
繁体部首:卜
繁体笔画:2
巤
liè
首尾分解: 巛乙
部首:巛
笔画:3
项
xiàng
首尾分解: 工页
简体部首:页
简体笔画:6
繁体部首:頁
繁体笔画:9
数据的外观:
#######################," ]:占乂(zhancha)
","<table width=""620"" border=""0"" cellpadding=""0"" cellspacing=""0"">
<tr bgcolor=""#FFFFFF"">
<td width=""100""><div id=""zibg""><p class=""U5365""></p></div></td>
<td width=""510"" style=""padding-left:10px"">
<p class=""text15"">
『卥』 <br>
<span class=""b"">拼音:</span><span class=""pinyin"">xī<script>Setduyin('Duyin/xi1')</script></span> <span class=""b"">注音:</span><span class=""pinyin"">ㄒㄧ<script>Setduyin('Duyin/xi1')</script></span><br>
<span class=""b"">简体部首:</span>丨 <span class=""b"">部首笔画:</span>1 <span class=""b"">总笔画:</span>8<br><span class=""b"">繁体部首:</span>卜 <span class=""b"">部首笔画:</span>2 <span class=""b"">总笔画:</span>8<br><span class=""b"">康熙字典笔画</span>( 卥:8; )
</p></td>
</tr>
</table>"
#######################," ]:巛乙(chuanyi)
","<table width=""620"" border=""0"" cellpadding=""0"" cellspacing=""0"">
<tr bgcolor=""#FFFFFF"">
<td width=""100""><div id=""zibg""><p class=""U5DE4""></p></div></td>
<td width=""510"" style=""padding-left:10px"">
<p class=""text15"">
『巤』 <br>
<span class=""b"">拼音:</span><span class=""pinyin"">liè<script>Setduyin('Duyin/lie4')</script></span> <span class=""b"">注音:</span><span class=""pinyin"">ㄌㄧㄝˋ<script>Setduyin('Duyin/lie4')</script></span><br>
<span class=""b"">部首:</span>巛 <span class=""b"">部首笔画:</span>3 <span class=""b"">总笔画:</span>15<br><span class=""b"">康熙字典笔画</span>( 巤:15; )
</p></td>
</tr>
</table>"
占乂(占茶)
","
『卥』
拼音:xīSetduyin(‘Duyin/xi1’) 注音:ㄒㄧSetduyin('Duyin/xi1')
简体部首:丨 部首笔画:1. 总笔画:8
繁体部首:卜 部首笔画:2. 总笔画:8
康熙字典笔画( 卥:8; )
"
#######################," ]:巛乙(传意)
","
『巤』
拼音:李塞都音(“都音/列4”) 注音:ㄌㄧㄝˋSetduyin('Duyin/lie4')
部首:巛 部首笔画:3. 总笔画:15
康熙字典笔画( 巤:15; )
"
但很明显,为什么您没有预期的顺序:您正在逐行读取文件,当然,只有在第三个周期(因此,在处理第一个和第二个周期后)才能获得pat0的正确行
您可能应该创建一个实用程序对象,该对象有助于在解析后重新排列行。问题是在排序行中找到组标识符。我无法读取您的字母表,因此我无法在这方面提供帮助
当您有“组id”时你可以创建一个java.lang.Compariable对象,它使用组id和模式编号,以便在放入集合时具有正确的顺序。解析结束后,你可以打印出行。Regex和标记。噩梦。使用类似的解析器,忘记你曾经编写过的代码。作为警告,请查看post。哈哈,什么?!我不可能这么做的一整天!jsoup到底是什么?令人惊讶的是,你和我似乎都在解决同样的问题。也许你应该齐心协力解决这个问题。然后,也许我们可以说服你们两个在遇到,比如说,HTML中被注释掉的部分之前,停止使用正则表达式解析HTML,老神会生气吗?你想解决一个问题吗问题,请正确解决。@barq您认为这是codereview.se的一个问题是错误的。当代码没有达到询问者希望代码达到的目的时,我们称代码已损坏,并将关闭该问题。一旦代码修复,此问题可能会成为主题。
#######################," ]:占乂(zhancha)
","<table width=""620"" border=""0"" cellpadding=""0"" cellspacing=""0"">
<tr bgcolor=""#FFFFFF"">
<td width=""100""><div id=""zibg""><p class=""U5365""></p></div></td>
<td width=""510"" style=""padding-left:10px"">
<p class=""text15"">
『卥』 <br>
<span class=""b"">拼音:</span><span class=""pinyin"">xī<script>Setduyin('Duyin/xi1')</script></span> <span class=""b"">注音:</span><span class=""pinyin"">ㄒㄧ<script>Setduyin('Duyin/xi1')</script></span><br>
<span class=""b"">简体部首:</span>丨 <span class=""b"">部首笔画:</span>1 <span class=""b"">总笔画:</span>8<br><span class=""b"">繁体部首:</span>卜 <span class=""b"">部首笔画:</span>2 <span class=""b"">总笔画:</span>8<br><span class=""b"">康熙字典笔画</span>( 卥:8; )
</p></td>
</tr>
</table>"
#######################," ]:巛乙(chuanyi)
","<table width=""620"" border=""0"" cellpadding=""0"" cellspacing=""0"">
<tr bgcolor=""#FFFFFF"">
<td width=""100""><div id=""zibg""><p class=""U5DE4""></p></div></td>
<td width=""510"" style=""padding-left:10px"">
<p class=""text15"">
『巤』 <br>
<span class=""b"">拼音:</span><span class=""pinyin"">liè<script>Setduyin('Duyin/lie4')</script></span> <span class=""b"">注音:</span><span class=""pinyin"">ㄌㄧㄝˋ<script>Setduyin('Duyin/lie4')</script></span><br>
<span class=""b"">部首:</span>巛 <span class=""b"">部首笔画:</span>3 <span class=""b"">总笔画:</span>15<br><span class=""b"">康熙字典笔画</span>( 巤:15; )
</p></td>
</tr>
</table>"