Java 是否存在这种类型的regex命令？_Java

Java 是否存在这种类型的regex命令？

java

Java 是否存在这种类型的regex命令？,java,Java,我正在阅读Oracle正则表达式文档，但似乎找不到可以用来替换下面for循环的内容。我已经抓取了一个html网页的主体，但是我还剩下html标签。是否有一个regex命令允许您替换以“”开头的所有内容？基本上完全删除html标记？for循环确实有效，我只是希望能找到更干净的东西 char[] charWordsOfWebsite = wordsOfWebsite.toCharArray(); //wordsOfWebsite is the String I stored the html

我正在阅读Oracle正则表达式文档，但似乎找不到可以用来替换下面for循环的内容。我已经抓取了一个html网页的主体，但是我还剩下html标签。是否有一个regex命令允许您替换以“”开头的所有内容？基本上完全删除html标记？for循环确实有效，我只是希望能找到更干净的东西

    char[] charWordsOfWebsite = wordsOfWebsite.toCharArray(); //wordsOfWebsite is the String I stored the html page into. Then store string as an array of characters.

    boolean insideHTMLTag = false;

    for (int i = 0; i <= charWordsOfWebsite.length-1 ; i++) {   //This loop gets rid of all html tags

        if (charWordsOfWebsite[i] == '<'){  //Beginning of html tag
            charWordsOfWebsite[i] = ' ';
            insideHTMLTag = true;
        } else if (insideHTMLTag && charWordsOfWebsite[i] != '>'){  //Inside html tag
            charWordsOfWebsite[i] = ' ';
        } else if (charWordsOfWebsite[i] == '>'){   //End of html tag
            charWordsOfWebsite[i] = ' ';
            insideHTMLTag = false;
        }
    }
    //Put char array into string, replace multiple white spaces with one white space, inverted regex replaces all characters except a-z, A-Z, 0-9, finally use setter to store the refined words string for later use.
    setRefinedWordsOfWebsite(new String(charWordsOfWebsite).trim().replaceAll("\\s{2,}", " ").replaceAll("[^a-zA-Z0-9\\s]", ""));

char[]charWordsOfWebsite=wordsOfWebsite.toCharArray（）//wordsOfWebsite是我存储html页面的字符串。然后将字符串存储为字符数组。
布尔值insideHTMLTag=false；
对于（int i=0；i您可以使用正则表达式，]+>
来匹配所有HTML标记。[]
中的^
模式在[]
和+
中是一个表示
演示：
public class Main {
    public static void main(String[] args) {
        // Test string
        String str="<html>\n" + 
                "<head>\n" + 
                "   <title>Hello World</title>\n" + 
                "</head>\n" + 
                "<body>\n" + 
                "   The whole world is facing economic challenge due to Coronavirus pandemic.\n" + 
                "</body>\n" + 
                "</html>";

        str = str.replaceAll("<[^>]+>", "");
        System.out.println(str);
    }
}

Hello World


The whole world is facing economic challenge due to Coronavirus pandemic.

检查另一个演示
更新：
如果您还想匹配中提到的模式，请使用正则表达式。
您可以使用正则表达式，]+>
匹配所有HTML标记。[]
中的^
模式中的[]
是一个表示
演示：
public class Main {
    public static void main(String[] args) {
        // Test string
        String str="<html>\n" + 
                "<head>\n" + 
                "   <title>Hello World</title>\n" + 
                "</head>\n" + 
                "<body>\n" + 
                "   The whole world is facing economic challenge due to Coronavirus pandemic.\n" + 
                "</body>\n" + 
                "</html>";

        str = str.replaceAll("<[^>]+>", "");
        System.out.println(str);
    }
}

Hello World


The whole world is facing economic challenge due to Coronavirus pandemic.

检查另一个演示
更新：
如果您还想匹配中提到的模式，请使用正则表达式。
如果您只想删除html标记而不是这些标记中的内容，请使用以下正则表达式]+>
。您可以使用.replaceAll（）String
类提供的方法，用于替换字符串中出现的所有html标记。如果您只想删除html标记，而不想删除这些标记中的内容，请使用以下正则表达式]+>
。您可以使用.replaceAll（）String
类提供的方法，用于替换字符串中所有出现的html标记。这是有效的html:
谢谢，这非常有效。我还使用该模式排除[]括号之间的任何内容。最后的代码类似于-->setRefinedWordsOfWebsite（wordsOfWebsite.trim（）.replaceAll(“\[^a-zA-Z0-9\\s]），”）.replaceAll（“]+>”，”）.replaceAll（“（[^a-zA-Z0-9\\s]），”）.replaceAll（“\\s{2，}，”）；
非常欢迎您@Timothy@VGR-您的观点已在更新部分中阐述。这是有效的HTML:
谢谢您，这非常有效。我还使用该模式排除了介于[]括号也一样。最后的代码看起来像-->setRefinedWordsOfWebsite（wordsOfWebsite.trim（）.replaceAll（\[^\]+\]>，）.replaceAll（“]+>，”）.replaceAll（[^a-zA-Z0-9\\s]），”）.replaceAll（\\s{2，}，”）；
非常欢迎您@Timothy@VGR-您的观点已在更新部分中阐述。