匹配问题。Mathcher通过一个来获取结果,第一个位置可能会不时不同,Java

匹配问题。Mathcher通过一个来获取结果,第一个位置可能会不时不同,Java,java,design-patterns,download,matcher,Java,Design Patterns,Download,Matcher,有一个网站。我正试图从那里得到所有演员的名字(有35个)。我发现了允许我执行这项工作的模式,但问题是我只能通过“一”将所有可能名称的一半添加到ArrayList中,并且第一个元素可能会不时有所不同。所以有一次我可以得到1,3,5,…,35个名字,另一次是2,4,6,…,34个名字,但不能同时得到所有的名字。我做错了什么?代码如下 public class Main { public static void main(String[] args) { Str

有一个网站。我正试图从那里得到所有演员的名字(有35个)。我发现了允许我执行这项工作的模式,但问题是我只能通过“一”将所有可能名称的一半添加到ArrayList中,并且第一个元素可能会不时有所不同。所以有一次我可以得到1,3,5,…,35个名字,另一次是2,4,6,…,34个名字,但不能同时得到所有的名字。我做错了什么?代码如下

public class Main {
        public static void main(String[] args) {
            String str = "https://m.imdb.com/list/ls000984564/";

            HttpURLConnection urlConnection = null;

            try {
                URL url = new URL(str);
                urlConnection = (HttpURLConnection) url.openConnection();
                InputStream inputStream = urlConnection.getInputStream();
                InputStreamReader inputStreamReader = new InputStreamReader(inputStream);
                BufferedReader bufferedReader = new BufferedReader(inputStreamReader);

                StringBuilder htmlCode = new StringBuilder();

                while (bufferedReader.readLine() != null)
                    htmlCode.append(bufferedReader.readLine());

                urlConnection.disconnect();

                ArrayList<String> actorsNamesList = new ArrayList<>();
                Pattern pattern = Pattern.compile("<h4>(.*?)</h4>");
                Matcher matcher = pattern.matcher(htmlCode.toString());

                while (matcher.find())
                    actorsNamesList.add(matcher.group(1));

                for (String name : actorsNamesList)
                    System.out.println(name);

                System.out.println("Size of a list: " + actorsNamesList.size());

            } catch (MalformedURLException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            } finally {
                if (urlConnection != null) {
                    urlConnection.disconnect();
                }
            }
        }
    } 
我在Android Studio中经常看到的另一个:

2019-12-07 02:22:20.837 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Christian Bale
2019-12-07 02:22:20.837 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Mark Wahlberg
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Matt Damon
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Daniel Day-Lewis
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Steve Carell
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Edward Norton
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Brad Pitt
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Ryan Reynolds
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Geoffrey Rush
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Ken Watanabe
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Aaron Eckhart
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Clive Owen
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Will Ferrell
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Benicio Del Toro
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: James Gandolfini
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Josh Hartnett
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Greg Kinnear
2019-12-07 02:22:20.840 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Size: 17

正则表达式不是解析HTML的正确工具。您可以尝试专门为此目的构建的库。例如,如果页面内容是从javascript动态生成的,您可以使用HtmlUnit加载页面内容,等待javascript处理完成,然后使用XPath查找您需要的内容。感谢您提供的资源,但这是我的一个测试,我要找出为什么它通过一个添加结果,为什么结果会不时不同,因为我找不到错误。谢谢,boot and bonnet,我会尝试的。
2019-12-07 02:22:20.837 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Christian Bale
2019-12-07 02:22:20.837 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Mark Wahlberg
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Matt Damon
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Daniel Day-Lewis
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Steve Carell
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Edward Norton
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Brad Pitt
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Ryan Reynolds
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Geoffrey Rush
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Ken Watanabe
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Aaron Eckhart
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Clive Owen
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Will Ferrell
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Benicio Del Toro
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: James Gandolfini
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Josh Hartnett
2019-12-07 02:22:20.838 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Greg Kinnear
2019-12-07 02:22:20.840 13840-13840/com.rumato.gratestfilmamericanlegends I/URL: Size: 17