java Jsoup问题:如何按单词分割?

java Jsoup问题:如何按单词分割?,java,jsoup,Java,Jsoup,我想得到没有标签的html内容和结果 word word word 所以我尝试了以下方法 public class PreProcessing { public static void main(String\[\] args) throws Exception { PrintWriter out = new PrintWriter("filename.txt"); URL url = new URL("[https://en.wikipedia.

我想得到没有标签的html内容和结果

word
word
word
所以我尝试了以下方法

public class PreProcessing {

    public static void main(String\[\] args) throws Exception {

        PrintWriter out = new PrintWriter("filename.txt");

        URL url = new URL("[https://en.wikipedia.org/wiki/Distributed\_computing](https://en.wikipedia.org/wiki/Distributed_computing)");

        BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));

        String inputLine = "";

        String input = "";


        while ((inputLine = in.readLine()) != null)

        {
            input += inputLine;
            //          System.out.println(inputLine);
        }

        //create Jsoup document from HTML

        Document jsoupDoc = Jsoup.parse(input);

        //set pretty print to false, so \\n is not removed

        jsoupDoc.outputSettings(new OutputSettings().prettyPrint(false));

        //select all <br> tags and append \\n after that

        //        [jsoupDoc.select](https://jsoupDoc.select)("br").after("\\\\n");

        //select all <p> tags and prepend \\n before that

        //        [jsoupDoc.select](https://jsoupDoc.select)("p").before("\\\\n");

        //get the HTML from the document, and retaining original new lines

        String str = jsoupDoc.html().replaceAll(" ", "\n");
        //        str.replaceAll("\t", "");

        String strWithNewLines = Jsoup.clean(str, "", Whitelist.none(), new OutputSettings().prettyPrint(false));
        strWithNewLines.replaceAll("\t", "\n");
        strWithNewLines.replaceAll("\\"", "");

        strWithNewLines.replaceAll(".", "");

        System.out.println(strWithNewLines);

        out.print(strWithNewLines);
    }
}
但我想要这样的结果

Distributed

computing

-

Wikipedia

Distributed

computing

From

Wikipedia

the

free

encyclopedia

Jump

to

navigation

Jump

to

search

Distributed

application

redirects

here

For

trustless

applications

see
我试着

strWithNewLines.replaceAll("\\"", "");

strWithNewLines.replaceAll(".", "");

但这并不奏效。为什么不起作用?我用谷歌搜索了一下,但找不到解决方案。

最后几行试试这个。这将使您更接近您想要的结果:

String strWithNewLines = Jsoup.clean ...;
String result = strWithNewLines.replaceAll("\t", "\n")
    .replaceAll("\"", "");
    //.replaceAll(".", "");

System.out.println(result);
代码中的问题是字符串是不可变的,因此
String.replaceAll
将不替换原始字符串中的任何内容,而是在已执行子位置生成一个新字符串。但你永远不会使用结果

.replaceAll(“.”,“)
有一个问题。这将为您提供一个空字符串,因为
匹配每个字符,并且它将被一个空字符串替换

String strWithNewLines = Jsoup.clean ...;
String result = strWithNewLines.replaceAll("\t", "\n")
    .replaceAll("\"", "");
    //.replaceAll(".", "");

System.out.println(result);