Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/spring-mvc/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 解析选项卡分隔的文件_Java_Regex - Fatal编程技术网

Java 解析选项卡分隔的文件

Java 解析选项卡分隔的文件,java,regex,Java,Regex,我正试图从IMDB获取TSV: $hutter Battle of the Sexes (2017) (as $hutter Boy) [Bobby Riggs Fan] <10> NVTION: The Star Nation Rapumentary (2016) (as $hutter Boy) [Himself] <1> Secret in Their

我正试图从IMDB获取TSV:

$hutter             Battle of the Sexes (2017)  (as $hutter Boy)  [Bobby Riggs Fan]  <10>
                    NVTION: The Star Nation Rapumentary (2016)  (as $hutter Boy)  [Himself]  <1>
                    Secret in Their Eyes (2015)  (uncredited)  [2002 Dodger Fan]
                    Steve Jobs (2015)  (uncredited)  [1988 Opera House Patron]
                    Straight Outta Compton (2015)  (uncredited)  [Club Patron/Dopeman]



$lim, Bee Moe       Fatherhood 101 (2013)  (as Brandon Moore)  [Himself - President, Passages]
                    For Thy Love 2 (2009)  [Thug 1]
                    Night of the Jackals (2009) (V)  [Trooth]
                    "Idle Talk" (2013)  (as Brandon Moore)  [Himself]
                    "Idle Times" (2012) {(#1.1)}  (as Brandon Moore)  [Detective Ryan Turner]

在获得和
arrayoutofbounds
异常之前,我这样做的方式只适用于几行。如果行有一个或多个制表符,如何解析行并最多将其拆分为两个字符串?

解析制表符/逗号分隔的数据文件时有一些微妙之处,它们与引用和转义有关

<> P>为节省大量的工作、挫折和头痛,你确实应该考虑使用现有的CSV解析库之一,如OpenCsv或Apache CAMSONS CSV.


作为答案而不是评论发布,因为OP没有说明重新发明轮子的原因,而且有些任务确实已经一劳永逸地“解决”了。

不要将数据列表作为截图发布。复制/粘贴数据并缩进4个空格用于固定宽度格式。数据列表问题实际上是SO规则吗?复制粘贴数据是一个混乱的问题,需要很长时间才能修复。这是一个非常强烈的建议。您只需要显示几行数据,就足以表达您的问题,而不是全屏显示。图像应该只保留给真正的图像。如果它是文本,它就应该作为文本出现在帖子中。我将保持它的原样,直到有一条规则规定不是这样。谢谢我通常同意使用CSV库,但这种格式在我看来还不够CSV。
        while ((line = reader.readLine()) != null) {

            Matcher matcher = headerPattern.matcher(line);
            boolean headerMatchFound = matcher.matches();

            if (headerMatchFound) {
                Logger.getLogger(ActorListParser.class.getName()).log(Level.INFO, "Header for actor list found");

                String newline;

                reader.readLine();

                while ((newline = reader.readLine()) != null) {
                    String[] fullLine = null;

                    String actor;
                    String title;

                    Pattern startsWithTab = Pattern.compile("^\t.*");
                    Matcher tab = startsWithTab.matcher(newline);
                    boolean tabStartMatcher = tab.matches();

                    if (!tabStartMatcher) {

                        fullLine = newline.split("\t.*");

                   System.out.println("Actor: " + fullLine[0] +
                          "Movie: " + fullLine[1]);

                   }//this line will have code to match lines that start with tabs.
                }
          } 

        }