Java 解析选项卡分隔的文件
我正试图从IMDB获取TSV:Java 解析选项卡分隔的文件,java,regex,Java,Regex,我正试图从IMDB获取TSV: $hutter Battle of the Sexes (2017) (as $hutter Boy) [Bobby Riggs Fan] <10> NVTION: The Star Nation Rapumentary (2016) (as $hutter Boy) [Himself] <1> Secret in Their
$hutter Battle of the Sexes (2017) (as $hutter Boy) [Bobby Riggs Fan] <10>
NVTION: The Star Nation Rapumentary (2016) (as $hutter Boy) [Himself] <1>
Secret in Their Eyes (2015) (uncredited) [2002 Dodger Fan]
Steve Jobs (2015) (uncredited) [1988 Opera House Patron]
Straight Outta Compton (2015) (uncredited) [Club Patron/Dopeman]
$lim, Bee Moe Fatherhood 101 (2013) (as Brandon Moore) [Himself - President, Passages]
For Thy Love 2 (2009) [Thug 1]
Night of the Jackals (2009) (V) [Trooth]
"Idle Talk" (2013) (as Brandon Moore) [Himself]
"Idle Times" (2012) {(#1.1)} (as Brandon Moore) [Detective Ryan Turner]
在获得和
arrayoutofbounds
异常之前,我这样做的方式只适用于几行。如果行有一个或多个制表符,如何解析行并最多将其拆分为两个字符串?解析制表符/逗号分隔的数据文件时有一些微妙之处,它们与引用和转义有关
<> P>为节省大量的工作、挫折和头痛,你确实应该考虑使用现有的CSV解析库之一,如OpenCsv或Apache CAMSONS CSV.
作为答案而不是评论发布,因为OP没有说明重新发明轮子的原因,而且有些任务确实已经一劳永逸地“解决”了。不要将数据列表作为截图发布。复制/粘贴数据并缩进4个空格用于固定宽度格式。数据列表问题实际上是SO规则吗?复制粘贴数据是一个混乱的问题,需要很长时间才能修复。这是一个非常强烈的建议。您只需要显示几行数据,就足以表达您的问题,而不是全屏显示。图像应该只保留给真正的图像。如果它是文本,它就应该作为文本出现在帖子中。我将保持它的原样,直到有一条规则规定不是这样。谢谢我通常同意使用CSV库,但这种格式在我看来还不够CSV。
while ((line = reader.readLine()) != null) {
Matcher matcher = headerPattern.matcher(line);
boolean headerMatchFound = matcher.matches();
if (headerMatchFound) {
Logger.getLogger(ActorListParser.class.getName()).log(Level.INFO, "Header for actor list found");
String newline;
reader.readLine();
while ((newline = reader.readLine()) != null) {
String[] fullLine = null;
String actor;
String title;
Pattern startsWithTab = Pattern.compile("^\t.*");
Matcher tab = startsWithTab.matcher(newline);
boolean tabStartMatcher = tab.matches();
if (!tabStartMatcher) {
fullLine = newline.split("\t.*");
System.out.println("Actor: " + fullLine[0] +
"Movie: " + fullLine[1]);
}//this line will have code to match lines that start with tabs.
}
}
}