Java 改进代码以替换所有字符串_Java_String_Performance_Replace_Replaceall

Java 改进代码以替换所有字符串

java string performance replace

Java 改进代码以替换所有字符串,java,string,performance,replace,replaceall,Java,String,Performance,Replace,Replaceall,是否有其他方法可以更有效地执行大量“replaceAll”，使用尽可能少的内存 public static String cleanWordTags(String source) { String copy = source; copy = copy.replaceAll("<P style=\"M[^>]*>", "<P>"); copy = copy.replaceAll("<p style=\"M[^>]*>",

是否有其他方法可以更有效地执行大量“replaceAll”，使用尽可能少的内存

 public static String cleanWordTags(String source) {

    String copy = source;

    copy = copy.replaceAll("<P style=\"M[^>]*>", "<P>");
    copy = copy.replaceAll("<p style=\"M[^>]*>", "<p>");
    copy = copy.replaceAll("<p style=\"T[^>]*>", "<p>");

    copy = copy.replaceAll("<b style=[^>]*>", "<b>");

    copy = copy.replaceAll("<span class=\"M[^>]*>", "<span>");
    copy = copy.replaceAll("<span style='m[^>]*>", "<span>");
    copy = copy.replaceAll("<span style=\"f[^>]*>", "<span>");
    copy = copy.replaceAll("<span lang[^>]*>", "<span>");
    copy = copy.replaceAll("<span style=\"color[^>]*>", "<span>");
    copy = copy.replaceAll("<span style=\"m[^>]*>", "<span>");
    copy = copy.replaceAll("<span style=\"line[^>]*>", "<span>");
    copy = copy.replaceAll("<span style=\"L[^>]*>", "<span>");
    copy = copy.replaceAll("<span style=\"T[^>]*>", "<span>");
    copy = copy.replaceAll("<span style=\"t[^>]*>", "<span>");

    copy = copy.replaceAll("<br [^>]*>", "<br/>");

    copy = copy.replaceAll("<i style=[^>]*>", "");
    copy = copy.replaceAll("</i>", "");

    copy = copy.replaceAll("<st1:personname[^>]*>", "");
    copy = copy.replaceAll("</st1:personname>", "");

    copy = copy.replaceAll("<st1:metricconverter[^>]*>", "");
    copy = copy.replaceAll("</st1:metricconverter>", "");

    copy = copy.replaceAll("<br[^>]*>", "<br/>");

    copy = copy.replaceAll("<\\W\\Wendif\\W\\W\\W>", "");

    copy = copy.replaceAll("<![^>]*>", "");


    copy = copy.replaceAll("<[vowm]:[^>]*>", "");
    copy = copy.replaceAll("</[vowm]:[^>]*>", ""); //&

    copy = copy.replaceAll("&(amp|lt|gt);", "");
    copy = copy.replaceAll("&nbsp;", "");

    copy = copy.replaceAll("<img width[^>]*>", "");
    copy = copy.replaceAll("<img src=\"file:[^>]*>", "");


    return copy;
}

我发现我可以使用StringUtils.replace代替replaceAll，但这只适用于没有正则表达式的字符串

谢谢

新的：

我尝试了下一个与注释相关的代码，但替换同一个字符串需要5倍多的时间：

 public static String cleanWordTags(String source) {
        String copy = source;

        long t0 = System.currentTimeMillis();

        String regex = "";

        regex += "(align=\"left\")";
        regex += "|(<mce:style>)";
        regex += "|(<i>)";
        regex += "|(<i style=[^>]*>)";
        regex += "|(</i>)";
        regex += "|(<st1:personname[^>]*>)";
        regex += "|(</st1:personname>)";
        regex += "|(<st1:metricconverter[^>]*>)";
        regex += "|(</st1:metricconverter>)";
        regex += "|(<\\W\\Wendif\\W\\W\\W>)";
        regex += "|(<![^>]*>)";
        regex += "|(<[vowm]:[^>]*>)";
        regex += "|(</[vowm]:[^>]*>)";
        regex += "|(&(amp|lt|gt);)";
        regex += "|(&nbsp;)";

        regex += "|(<img width[^>]*>)";
        regex += "|(<img src=\"file:[^>]*>)";

        Pattern p = Pattern.compile(regex);
        copy = p.matcher(copy.toUpperCase()).replaceAll("");

        regex = "";
        regex += "(<span style=\"t[^>]*>)";
        regex += "|(<span style=\"T[^>]*>)";
        regex += "|(<span style=\"L[^>]*>)";
        regex += "|(<span style=\"line[^>]*>)";
        regex += "|(<span style=\"m[^>]*>)";
        regex += "|(<span style=\"color[^>]*>)";
        regex += "|(<span lang[^>]*>)";
        regex += "|(<span style=\"f[^>]*>)";
        regex += "|(<span style='m[^>]*>)";
        regex += "|(<span class=\"M[^>]*>)";

        p = Pattern.compile(regex);
        copy = p.matcher(copy.toUpperCase()).replaceAll("");

        copy = copy.replaceAll("<br[^>]*>", "<br/>");

        //Sustituir
        //        copy = copy.replaceAll("<p class=[^>]*>", "<p>");
        //  copy = copy.replaceAll("<p align=[^>]*>", "<p>");
        copy = copy.replaceAll("<P style=\"M[^>]*>", "<P>");
        copy = copy.replaceAll("<p style=\"M[^>]*>", "<p>");
        copy = copy.replaceAll("<p style=\"T[^>]*>", "<p>");
        copy = copy.replaceAll("<b style=[^>]*>", "<b>");

        System.out.println(System.currentTimeMillis() - t0);

        return copy;
    }

您已经看过streamflyer了吗？请看：，虽然我不能说明性能，但它们声明：修改流中的字符-应用正则表达式，修复XML文档，无论您想做什么

此外，还有streamflyer regex fast see:，它提供了比streamflyer使用的算法更快的算法来匹配字符流上的正则表达式

因此，如果您的数据可用作Reader，例如，作为StringReader，您可以轻松地将首页的示例应用于代码，如下所示：

Reader reader = new StringReader("source <p style=\"Memphis\">");
FastRegexModifier modifier = new FastRegexModifier("<P style=\"M[^>]*>", Pattern.CASE_INSENSITIVE, "<P>");
ModifyingReader modifyingReader = new ModifyingReader(reader, modifier);
String result = IOUtils.toString(modifyingReader);

这样做的优点是可以使用不区分大小写的标志，这可能会减少需要定义的规则数量。但请注意：这也可能会影响性能，因此您应该评估这两种可能性

如果此解决方案有助于提高您的性能，请向我们报告。

即使您想使用正则表达式，这种方法也是非常低效的，因为您一次又一次地搜索整个字符串并创建大量垃圾。正确的方法是在类似于的循环中使用匹配器进行迭代

只要让你匹配所有可能感兴趣的内容，并根据发现的内容进行分支。你的模式可能是

(?:<(p|b|span|br|i|st1:personname|st1:metricconverter|\\W\\Wendif\\W\\W\\W|!|vowm:|img))[^>]+>)|&(amp|lt|gt|nbsp);

它比您想要的匹配更多，但在这种情况下，您可以将replacement设置为$0。它只需要一次穿过整个字符串。您可能需要执行两次操作以使其更简单。

最后，我找到的唯一解决方案是替换所有replaceAll，而不使用replace的正则表达式，并尝试对正则表达式进行泛化

非常感谢

为什么不使用HTML解析器或其他什么呢？。Regex+HTML==坏主意。问题是我有一个tinymce，人们使用组件的按钮写东西，或者简单地从word复制粘贴，然后结果被用来生成一个文档，所以我需要自己控制标签。我不是在解析，我是在删除标签：PIs你主要关心的是内存消耗还是时间消耗？你的弦有多长？