Java 如何在模式重叠时找到发生点

Java 如何在模式重叠时找到发生点,java,regex,Java,Regex,上下文: 这是一个日志分析的事情。我正在创建一个regex程序,以查找从客户端发送到服务器的某些请求。我有包含这些请求和其他日志的客户端日志文件 问题: 将请求消息发送到服务器时,客户端应有2条日志语句,如: sending.. message_type 当发现上述语句或模式时,我们可以说已经发送了一个请求。它是组合模式。嗯 我们希望日志文件内容如下所示 sending.. message_type ...//other text sending.. message_type ...//oth

上下文: 这是一个日志分析的事情。我正在创建一个
regex
程序,以查找从客户端发送到服务器的某些请求。我有包含这些请求和其他日志的客户端日志文件

问题: 将请求消息发送到服务器时,客户端应有2条日志语句,如:

sending..
message_type
当发现上述语句或模式时,我们可以说已经发送了一个请求。它是组合模式。嗯

我们希望日志文件内容如下所示

sending..
message_type
...//other text
sending..
message_type
...//other text
sending..
message_type
从上面的日志中,我们可以说客户端已经发送了3条消息。但在实际的日志文件中,模式重叠,如下所示(不是针对所有消息,而是针对某些消息):

仍然有3个请求(我对要理解的消息进行了编号)。但模式是重叠的,即在完全记录第一条消息之前,记录第二条消息。 以上解释仅供理解。以下是原始日志的一部分:

原始日志

Send message to server:
Created post notification log dir
Created post notification log dir
Created post notification log dir
Send message to server:
Created post notification log dir
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></params></response></message>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></response></message>
预期产出

INPUT:
Send message to server:
Created post notification log dir
Created post notification log dir
Created post notification log dir
Send message to server:
Created post notification log dir
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
Store in DB :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></params></response></message>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></response></message>

COUNT:
              to server:  2
         create_session:  2
从日志中可以清楚地看出,发送了2条消息。因此,输出将是:

 Send message to server:  2
         create_session:  2
您可以看到我在代码中尝试的模式。有人能提出一种模式来获得期望的结果吗

注意:可以简单地说,为什么不单独使用count
向服务器发送消息。因为在日志中有许多类型的消息,如
登录、closesession
等。所有这些消息的第一部分都是
向服务器发送消息
。此外,它们还单独记录了用于其他目的的消息类型,因此我们无法对任何部分进行中继(这意味着只有我们可以中继的组合)

查找从客户端发送到服务器的某些请求的发生情况

这里您可以忽略的“其他方式”,它将像
存储在DB:
中,而不是
将消息发送到服务器
和xml消息

我会提出一个新的策略:

  • 仅使用1个正则表达式匹配所有可选项,只解析日志一次(提高长文件的性能)
  • 独立匹配
    type=\“createsession\”
    xmls
  • 同样匹配
    存储在DB:
    xmls中,但忽略它们(不要增加计数器)
  • 我们可以使用以下表达式来匹配发送到服务器的消息数

    ^(?<toserver>Send message to server:)
    
    • 后来被称为
      regexMatcher.group(“消息”)
    • 我们将使用一个独立的计数器
    那么,我们如何忽略
    存储在DB:
    xmls中?我们可以在不创建捕获的情况下匹配它们

    ^Store in DB ?:\r?\n(?:.*\n)*?<\? *xml\b.*
    


    代码

    static final String create_session = "^(?:Store in DB ?:\\r?\\n(?:.*\\n)*?<\\? *xml\\b.*|(?<toserver>Send message to server:)|(?<message><\\? *xml\\b.{10,500} type *= *\\\"createsession\\\"))";
    
    public static void main (String[] args) throws java.lang.Exception
    {
        //for testing purposes
        final String text = "Send message to server:\nCreated post notification log dir\nCreated post notification log dir\nCreated post notification log dir\nSend message to server:\nCreated post notification log dir\n<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><request xaction_guid=\"new xaction guid\" type=\"createsession\"/></message>\nStore in DB :\n<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><request xaction_guid=\"new xaction guid\" type=\"createsession\"/></message>\nINFO [a] - Server Response: <?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><response xaction_guid=\"new xaction guid\" type=\"ok\"></params></response></message>\n<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><request xaction_guid=\"new xaction guid\" type=\"createsession\"/></message>\nINFO [a] - Server Response: <?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><response xaction_guid=\"new xaction guid\" type=\"ok\"></response></message>";
        System.out.println("INPUT:\n" + text + "\n\nCOUNT:");
        StringBuilder b = new StringBuilder();
        b.append(text);
    
        findMatch(b,create_session,"create_session");
    }
    
    private static int findMatch(StringBuilder b,String pattern, String type) {
        int count =0;  // counter for "Send message to server:"
        int countType=0; // counter for "type=\"createsession\""
        Pattern regex = Pattern.compile(pattern,Pattern.MULTILINE);
        Matcher regexMatcher = regex.matcher(b.toString());
        while (regexMatcher.find()) {
            if (regexMatcher.group("toserver") != null) {
                count++;
            } else if (regexMatcher.group("message") != null) {
                countType++;
            } else {
                // Ignoring "Store in DB :\n<?xml...."
            }
        } 
        System.out.printf("%25s%2d\n%25s%2d\n", "to server: ", count, type+": ", countType);
        return countType;
    }
    

    static final String create\u session=“^(?:存储在数据库中?:\\r?\\n(?:.\\n)*?那么为什么不单独计算
    type=\“createsession\”
    ?…是什么阻止它记录
    发送(1)
    +
    发送(2)
    +
    消息(1)
    +
    消息(2)
    ?为什么\“
    xmls与
    Send message to server
    以及一些其他方式(如存储在
    db
    中)一起记录。因此,我们不能简单地依赖于此。那么,您如何知道是否存在
    类型=\“createsession\”的“其他方式”
    将消息发送到服务器
    和要匹配的xml之间?能否显示一个要忽略的xml示例?“其他方式”您可以忽略这一点,它将像
    存储在DB:
    中,而不是
    向服务器发送消息
    和xml消息。我只想知道如何处理这种重叠的情况谢谢@Mariano,我喜欢您指定的组和策略。我想我可以从这种方法开始
    ^(?<toserver>Send message to server:)
    
    ^(?<message><\? *xml\b.{10,500} type *= *\"createsession\")
    
    ^Store in DB ?:\r?\n(?:.*\n)*?<\? *xml\b.*
    
    ^(?:Store in DB ?:\r?\n(?:.*\n)*?<\? *xml\b.*|(?<toserver>Send message to server:)|(?<message><\? *xml\b.{10,500} type *= *\"createsession\"))
    
    static final String create_session = "^(?:Store in DB ?:\\r?\\n(?:.*\\n)*?<\\? *xml\\b.*|(?<toserver>Send message to server:)|(?<message><\\? *xml\\b.{10,500} type *= *\\\"createsession\\\"))";
    
    public static void main (String[] args) throws java.lang.Exception
    {
        //for testing purposes
        final String text = "Send message to server:\nCreated post notification log dir\nCreated post notification log dir\nCreated post notification log dir\nSend message to server:\nCreated post notification log dir\n<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><request xaction_guid=\"new xaction guid\" type=\"createsession\"/></message>\nStore in DB :\n<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><request xaction_guid=\"new xaction guid\" type=\"createsession\"/></message>\nINFO [a] - Server Response: <?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><response xaction_guid=\"new xaction guid\" type=\"ok\"></params></response></message>\n<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><request xaction_guid=\"new xaction guid\" type=\"createsession\"/></message>\nINFO [a] - Server Response: <?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><response xaction_guid=\"new xaction guid\" type=\"ok\"></response></message>";
        System.out.println("INPUT:\n" + text + "\n\nCOUNT:");
        StringBuilder b = new StringBuilder();
        b.append(text);
    
        findMatch(b,create_session,"create_session");
    }
    
    private static int findMatch(StringBuilder b,String pattern, String type) {
        int count =0;  // counter for "Send message to server:"
        int countType=0; // counter for "type=\"createsession\""
        Pattern regex = Pattern.compile(pattern,Pattern.MULTILINE);
        Matcher regexMatcher = regex.matcher(b.toString());
        while (regexMatcher.find()) {
            if (regexMatcher.group("toserver") != null) {
                count++;
            } else if (regexMatcher.group("message") != null) {
                countType++;
            } else {
                // Ignoring "Store in DB :\n<?xml...."
            }
        } 
        System.out.printf("%25s%2d\n%25s%2d\n", "to server: ", count, type+": ", countType);
        return countType;
    }
    
    INPUT:
    Send message to server:
    Created post notification log dir
    Created post notification log dir
    Created post notification log dir
    Send message to server:
    Created post notification log dir
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
    Store in DB :
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
    INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></params></response></message>
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
    INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></response></message>
    
    COUNT:
                  to server:  2
             create_session:  2