Java 查找字符串中打开或关闭的html标记数_Java_String_Algorithm

Java 查找字符串中打开或关闭的html标记数

java string algorithm

Java 查找字符串中打开或关闭的html标记数,java,string,algorithm,Java,String,Algorithm,我试图找出在字符串中查找有效HTML标记数的最佳方法假设标签只有在有开始和结束标签时才有效这是一个测试用例的示例 INPUT "html": "<html><head></head><body><div><div></div></div>" Output "validTags":3 输入 “html”：” 输出 “有效标签”：3

我试图找出在字符串中查找有效HTML标记数的最佳方法

假设标签只有在有开始和结束标签时才有效

这是一个测试用例的示例

INPUT

"html": "<html><head></head><body><div><div></div></div>"

Output

"validTags":3

输入
“html”：”
输出
“有效标签”：3

如果需要解析HTML 不要自己动手。没有必要重新发明轮子。有大量用于解析HTML的库。使用适当的工具进行适当的工作

把精力集中在项目的其余部分上。当然，您可以实现自己的函数来解析字符串，查找

，并适当地执行操作。但是HTML可能比您想象的稍微复杂一些，或者您可能最终需要更多的HTML解析而不仅仅是计算标记

也许将来你还需要计算

和

。或者您需要查找HTML树的深度

可能您的自制代码没有考虑转义字符、嵌套标记等的所有可能组合。字符串中有多少正确的标记：

链接到类似的问题，并链接到库：
如果您想将打开-关闭标记对作为学习项目进行计数
以下是一个拟用伪代码作为递归函数的算法：
功能计数标签：
标记，余数=查找下一个标记
找到，内部，之后=找到\关闭\标记（标记，剩余）
如果（找到）
返回1+计数标签（内）+计数标签（后）
其他的
返回计数标签（内部）

示例

在字符串hello world
中，我们将获得：

tag=“”
“世界”
找到=真
inside=“世界”
在“”之后
返回1+计数标签（“世界”）+计数标签（“”）


在字符串
上：

tag=“”
余数=“”
发现=错误
inside=“”
在“”之后
返回计数标签（“”）


在字符串
上：

tag=“”
余数=“”
找到=真
inside=“”
在“”之后
返回1+计数标签（“”+计数标签（“”）
我编写了一个函数，可以完全做到这一点
static int checkValidTags(String html,String[] openTags, String[] closeTags) {
    //openTags and closeTags must have the same length;
    //This function keeps track of all opening tags.
    //and removes the opening and closing tags if the tag is closed correctly
    //It can even detect when there are labels added to the tags.
    HashMap<Character,Integer> open = new HashMap<>();
    HashMap<Character,Integer> close = new HashMap<>();

    //Use a start character, this is 1 because 0 would be a string terminator.
    int startChar = 1;
    for(int i = 0; i < openTags.length; i++) {
        open.put((char)startChar, i);
        close.put((char)(startChar+1), i);
        html = html.replaceAll(openTags[i],""+ (char)startChar);
        html = html.replaceAll(closeTags[i],""+(char)(startChar+1));
        startChar+=2;
    }
    List<List<Integer>> startIndexes = new ArrayList<>();
    int validLabels = 0;
    for(int i = 0; i < openTags.length; i++) {
        startIndexes.add(new ArrayList<>());
    }
    for(int i = 0; i < html.length(); i++) {
        char c = html.charAt(i);
        if(open.get(c)!=null) {
            startIndexes.get(open.get(c)).add(0,i);
        }
        if(close.get(c)!=null&&!startIndexes.get(close.get(c)).isEmpty()) {
            String closed = html.substring(startIndexes.get(close.get(c)).get(0),i);
            for(int k = 0; k < startIndexes.size(); k++) {
                if(!startIndexes.get(k).isEmpty()) {
                    int p = startIndexes.get(k).get(0);
                    if(p > startIndexes.get(close.get(c)).get(0)) {
                        startIndexes.get(k).remove(0);
                    }
                }
            }
            startIndexes.get(close.get(c)).remove(0);
            html.replace(closed, "");
            validLabels++;
        }
    }
    return validLabels;
    
}

这回答了你的问题吗？如何
？不要使用正则表达式解析HTML。如果您能够调整一个用于验证HTML的库，这将非常好。
在建议的问题中，您应该使用适当的解析器，而这些解析器可能已经有了解决方案。如果您想自己做（但为什么要这么做？）请记住，可能会有（有效的）自动关闭标记，如Saka129建议的
，以及无效的情况，如（每个标记都有一个关闭匹配，但嵌套是错误的），以及更多您可能还没有想到的情况。
    String html = "<html><head></head><body><div><div></div></div>";
    
    int validTags = checkValidTags(html,new String[] {
            //Add here all the tags you are looking for.
            //Remove the trailing '>' so it can detect extra tags appended to it
            "<html","<head","<body","<div"
    }, new String[]{
            "</html>","</head>","</body>","</div>"
    });
    
    System.out.println(validTags);

3