Regex 用于匹配服务器日志的常规扩展_Regex_Mapreduce

Regex 用于匹配服务器日志的常规扩展

regex mapreduce

Regex 用于匹配服务器日志的常规扩展,regex,mapreduce,Regex,Mapreduce,我需要从日志文件中识别服务器事件。为此，我正在使用模式匹配。我的正则表达式不起作用。请检查我的正则表达式是错误的还是其他问题示例输入为：-- 我的剧本是： public void map(Object key, Text value, Context context) throws IOException , InterruptedException{ String input=value.toString(); String delimiter= "[\n

我需要从日志文件中识别服务器事件。为此，我正在使用模式匹配。我的正则表达式不起作用。请检查我的正则表达式是错误的还是其他问题

示例输入为：--

我的剧本是：

public void map(Object key, Text value, Context context) throws IOException , InterruptedException{

        String input=value.toString();
        String delimiter= "[\n]";
        String[] tokens=input.split(delimiter);
        String sample = null;

        Pattern pattern;
        String regex= " \\s+\\d+\\s+[a-z,A-Z]+\\s ";
        pattern=Pattern.compile(regex);




        for(int i=0;i<tokens.length;i++){
            sample=tokens[i];
            System.out.println(sample.toString());
            System.out.println("enter here");

            Matcher match=pattern.matcher(sample);
            boolean val = match.matches();

            System.out.println("the conditions" + val);
            System.out.println("enter here 2");
            if(val){
                System.out.println("the regex is found" + val);
                logEvent.set(sample);
                System.out.println("the value of logEvent is "+ logEvent);
            }
            else{
                logInformation.set(sample);
                System.out.println("the log informaTION" + logInformation);
            }
        context.write(logEvent, logInformation);

public void映射（对象键、文本值、上下文）引发IOException、InterruptedException{
字符串输入=value.toString（）；
字符串分隔符=“[\n]”；
String[]tokens=input.split（分隔符）；
字符串sample=null；
模式；
String regex=“\\s+\\d+\\s+[a-z，a-z]+\\s”；
pattern=pattern.compile（regex）；
对于（int i=0；i请尝试此操作
try {
    Regex regexObj = new Regex(@"(?im)\s+(?<event>\d+\s+[a-z]+)\s+(?<details>[^\r\n]+)$");
    Match matchResults = regexObj.Match(subjectString);
    while (matchResults.Success) {
        for (int i = 1; i < matchResults.Groups.Count; i++) {
            Group groupObj = matchResults.Groups[i];
            if (groupObj.Success) {
                // matched text: groupObj.Value
                // match start: groupObj.Index
                // match length: groupObj.Length
            } 
        }
        matchResults = matchResults.NextMatch();
    } 
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

试试看{
Regex regexObj=new Regex（@“（？im）\s+（？\d+\s+[a-z]+）\s+（？[^\r\n]+）$”；
Match matchResults=regexObj.Match（subjectString）；
while（matchResults.Success）{
对于（int i=1；i

正则表达式解释
@"
(?im)          # Match the remainder of the regex with the options: case insensitive (i); ^ and $ match at line breaks (m)
\s             # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?<event>      # Match the regular expression below and capture its match into backreference with name “event”
   \d             # Match a single digit 0..9
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   \s             # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   [a-z]          # Match a single character in the range between “a” and “z”
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\s             # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?<details>    # Match the regular expression below and capture its match into backreference with name “details”
   [^\r\n]        # Match a single character NOT present in the list below
                     # A carriage return character
                     # A line feed character
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
$              # Assert position at the end of a line (at the end of the string or before a line break character)
"

@”
（？im）#将正则表达式的其余部分与选项匹配：不区分大小写（i）^和换行符处的$Match（m）
\s#匹配单个“空白字符”（空格、制表符和换行符）
+#在一次和无限次之间，尽可能多次，根据需要回馈（贪婪）
（？#匹配下面的正则表达式，并将其匹配捕获到名为“event”的backreference中
\d#匹配一个数字0..9
+#在一次和无限次之间，尽可能多次，根据需要回馈（贪婪）
\s#匹配单个“空白字符”（空格、制表符和换行符）
+#在一次和无限次之间，尽可能多次，根据需要回馈（贪婪）
[a-z]#在“a”和“z”之间匹配单个字符
+#在一次和无限次之间，尽可能多次，根据需要回馈（贪婪）
)
\s#匹配单个“空白字符”（空格、制表符和换行符）
+#在一次和无限次之间，尽可能多次，根据需要回馈（贪婪）
（？#匹配下面的正则表达式，并将其匹配捕获到名为“details”的backreference中
[^\r\n]#匹配下表中不存在的单个字符
#回车符
#换行字符
+#在一次和无限次之间，尽可能多次，根据需要回馈（贪婪）
)
$#在行尾（字符串末尾或换行符之前）断言位置
"


希望这能有所帮助。
在示例日志中，事件是--“STARTUP”。同样，在相同的模式中还有其他事件。我需要将它们与这些事件匹配，并将它们设置为logEvent。谢谢，但您的代码只识别关键字-“STARTUP”。问题是我有8个类似模式的不同事件。其他示例输入类似于--2009/12/14 11:49:20.94 TAS#####EC05003E 00 ConfgMem TAS配置成员内容这里我需要匹配-“ConfgMem”。您是否需要将整行内容与启动
关键字或同一行前面的内容
或同一行后面的内容该关键字匹配？上述代码仅适用于最后一种情况。好的。我的想法是---我们有一个空格，所以“\s+”。在该数字之后，顺序是--“\d+”再次后跟空格。然后事件发生，所以“[a-z，a-z]+\\S”后跟空格。所以正则表达式应该字符串regex=“\\S+\\d+\\S+[a-z，a-z]+\\S”；。我必须匹配关键工作—启动、预加载、ConfgEXE等。
@"
(?im)          # Match the remainder of the regex with the options: case insensitive (i); ^ and $ match at line breaks (m)
\s             # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?<event>      # Match the regular expression below and capture its match into backreference with name “event”
   \d             # Match a single digit 0..9
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   \s             # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   [a-z]          # Match a single character in the range between “a” and “z”
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\s             # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?<details>    # Match the regular expression below and capture its match into backreference with name “details”
   [^\r\n]        # Match a single character NOT present in the list below
                     # A carriage return character
                     # A line feed character
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
$              # Assert position at the end of a line (at the end of the string or before a line break character)
"