Regex 用于匹配服务器日志的常规扩展

Regex 用于匹配服务器日志的常规扩展,regex,mapreduce,Regex,Mapreduce,我需要从日志文件中识别服务器事件。为此,我正在使用模式匹配。我的正则表达式不起作用。请检查我的正则表达式是错误的还是其他问题 示例输入为:-- 我的剧本是: public void map(Object key, Text value, Context context) throws IOException , InterruptedException{ String input=value.toString(); String delimiter= "[\n

我需要从日志文件中识别服务器事件。为此,我正在使用模式匹配。我的正则表达式不起作用。请检查我的正则表达式是错误的还是其他问题

示例输入为:--


我的剧本是:

public void map(Object key, Text value, Context context) throws IOException , InterruptedException{

        String input=value.toString();
        String delimiter= "[\n]";
        String[] tokens=input.split(delimiter);
        String sample = null;

        Pattern pattern;
        String regex= " \\s+\\d+\\s+[a-z,A-Z]+\\s ";
        pattern=Pattern.compile(regex);




        for(int i=0;i<tokens.length;i++){
            sample=tokens[i];
            System.out.println(sample.toString());
            System.out.println("enter here");

            Matcher match=pattern.matcher(sample);
            boolean val = match.matches();

            System.out.println("the conditions" + val);
            System.out.println("enter here 2");
            if(val){
                System.out.println("the regex is found" + val);
                logEvent.set(sample);
                System.out.println("the value of logEvent is "+ logEvent);
            }
            else{
                logInformation.set(sample);
                System.out.println("the log informaTION" + logInformation);
            }
        context.write(logEvent, logInformation);    
public void映射(对象键、文本值、上下文)引发IOException、InterruptedException{
字符串输入=value.toString();
字符串分隔符=“[\n]”;
String[]tokens=input.split(分隔符);
字符串sample=null;
模式;
String regex=“\\s+\\d+\\s+[a-z,a-z]+\\s”;
pattern=pattern.compile(regex);

对于(int i=0;i请尝试此操作

try {
    Regex regexObj = new Regex(@"(?im)\s+(?<event>\d+\s+[a-z]+)\s+(?<details>[^\r\n]+)$");
    Match matchResults = regexObj.Match(subjectString);
    while (matchResults.Success) {
        for (int i = 1; i < matchResults.Groups.Count; i++) {
            Group groupObj = matchResults.Groups[i];
            if (groupObj.Success) {
                // matched text: groupObj.Value
                // match start: groupObj.Index
                // match length: groupObj.Length
            } 
        }
        matchResults = matchResults.NextMatch();
    } 
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}
试试看{
Regex regexObj=new Regex(@“(?im)\s+(?\d+\s+[a-z]+)\s+(?[^\r\n]+)$”;
Match matchResults=regexObj.Match(subjectString);
while(matchResults.Success){
对于(int i=1;i
正则表达式解释

@"
(?im)          # Match the remainder of the regex with the options: case insensitive (i); ^ and $ match at line breaks (m)
\s             # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?<event>      # Match the regular expression below and capture its match into backreference with name “event”
   \d             # Match a single digit 0..9
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   \s             # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   [a-z]          # Match a single character in the range between “a” and “z”
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\s             # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?<details>    # Match the regular expression below and capture its match into backreference with name “details”
   [^\r\n]        # Match a single character NOT present in the list below
                     # A carriage return character
                     # A line feed character
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
$              # Assert position at the end of a line (at the end of the string or before a line break character)
"
@”
(?im)#将正则表达式的其余部分与选项匹配:不区分大小写(i)^和换行符处的$Match(m)
\s#匹配单个“空白字符”(空格、制表符和换行符)
+#在一次和无限次之间,尽可能多次,根据需要回馈(贪婪)
(?#匹配下面的正则表达式,并将其匹配捕获到名为“event”的backreference中
\d#匹配一个数字0..9
+#在一次和无限次之间,尽可能多次,根据需要回馈(贪婪)
\s#匹配单个“空白字符”(空格、制表符和换行符)
+#在一次和无限次之间,尽可能多次,根据需要回馈(贪婪)
[a-z]#在“a”和“z”之间匹配单个字符
+#在一次和无限次之间,尽可能多次,根据需要回馈(贪婪)
)
\s#匹配单个“空白字符”(空格、制表符和换行符)
+#在一次和无限次之间,尽可能多次,根据需要回馈(贪婪)
(?#匹配下面的正则表达式,并将其匹配捕获到名为“details”的backreference中
[^\r\n]#匹配下表中不存在的单个字符
#回车符
#换行字符
+#在一次和无限次之间,尽可能多次,根据需要回馈(贪婪)
)
$#在行尾(字符串末尾或换行符之前)断言位置
"


希望这能有所帮助。

在示例日志中,事件是--“STARTUP”。同样,在相同的模式中还有其他事件。我需要将它们与这些事件匹配,并将它们设置为logEvent。谢谢,但您的代码只识别关键字-“STARTUP”。问题是我有8个类似模式的不同事件。其他示例输入类似于--2009/12/14 11:49:20.94 TAS#####EC05003E 00 ConfgMem TAS配置成员内容这里我需要匹配-“ConfgMem”。您是否需要将整行内容与
启动
关键字或
同一行前面的内容
同一行后面的内容
该关键字匹配?上述代码仅适用于最后一种情况。好的。我的想法是---我们有一个空格,所以“\s+”。在该数字之后,顺序是--“\d+”再次后跟空格。然后事件发生,所以“[a-z,a-z]+\\S”后跟空格。所以正则表达式应该字符串regex=“\\S+\\d+\\S+[a-z,a-z]+\\S”;。我必须匹配关键工作—启动、预加载、ConfgEXE等。
@"
(?im)          # Match the remainder of the regex with the options: case insensitive (i); ^ and $ match at line breaks (m)
\s             # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?<event>      # Match the regular expression below and capture its match into backreference with name “event”
   \d             # Match a single digit 0..9
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   \s             # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   [a-z]          # Match a single character in the range between “a” and “z”
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\s             # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?<details>    # Match the regular expression below and capture its match into backreference with name “details”
   [^\r\n]        # Match a single character NOT present in the list below
                     # A carriage return character
                     # A line feed character
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
$              # Assert position at the end of a line (at the end of the string or before a line break character)
"