Regex 用于匹配服务器日志的常规扩展
我需要从日志文件中识别服务器事件。为此,我正在使用模式匹配。我的正则表达式不起作用。请检查我的正则表达式是错误的还是其他问题 示例输入为:--Regex 用于匹配服务器日志的常规扩展,regex,mapreduce,Regex,Mapreduce,我需要从日志文件中识别服务器事件。为此,我正在使用模式匹配。我的正则表达式不起作用。请检查我的正则表达式是错误的还是其他问题 示例输入为:-- 我的剧本是: public void map(Object key, Text value, Context context) throws IOException , InterruptedException{ String input=value.toString(); String delimiter= "[\n
我的剧本是:
public void map(Object key, Text value, Context context) throws IOException , InterruptedException{
String input=value.toString();
String delimiter= "[\n]";
String[] tokens=input.split(delimiter);
String sample = null;
Pattern pattern;
String regex= " \\s+\\d+\\s+[a-z,A-Z]+\\s ";
pattern=Pattern.compile(regex);
for(int i=0;i<tokens.length;i++){
sample=tokens[i];
System.out.println(sample.toString());
System.out.println("enter here");
Matcher match=pattern.matcher(sample);
boolean val = match.matches();
System.out.println("the conditions" + val);
System.out.println("enter here 2");
if(val){
System.out.println("the regex is found" + val);
logEvent.set(sample);
System.out.println("the value of logEvent is "+ logEvent);
}
else{
logInformation.set(sample);
System.out.println("the log informaTION" + logInformation);
}
context.write(logEvent, logInformation);
public void映射(对象键、文本值、上下文)引发IOException、InterruptedException{
字符串输入=value.toString();
字符串分隔符=“[\n]”;
String[]tokens=input.split(分隔符);
字符串sample=null;
模式;
String regex=“\\s+\\d+\\s+[a-z,a-z]+\\s”;
pattern=pattern.compile(regex);
对于(int i=0;i请尝试此操作
try {
Regex regexObj = new Regex(@"(?im)\s+(?<event>\d+\s+[a-z]+)\s+(?<details>[^\r\n]+)$");
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
for (int i = 1; i < matchResults.Groups.Count; i++) {
Group groupObj = matchResults.Groups[i];
if (groupObj.Success) {
// matched text: groupObj.Value
// match start: groupObj.Index
// match length: groupObj.Length
}
}
matchResults = matchResults.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
试试看{
Regex regexObj=new Regex(@“(?im)\s+(?\d+\s+[a-z]+)\s+(?[^\r\n]+)$”;
Match matchResults=regexObj.Match(subjectString);
while(matchResults.Success){
对于(int i=1;i
正则表达式解释
@"
(?im) # Match the remainder of the regex with the options: case insensitive (i); ^ and $ match at line breaks (m)
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?<event> # Match the regular expression below and capture its match into backreference with name “event”
\d # Match a single digit 0..9
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
[a-z] # Match a single character in the range between “a” and “z”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?<details> # Match the regular expression below and capture its match into backreference with name “details”
[^\r\n] # Match a single character NOT present in the list below
# A carriage return character
# A line feed character
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
$ # Assert position at the end of a line (at the end of the string or before a line break character)
"
@”
(?im)#将正则表达式的其余部分与选项匹配:不区分大小写(i)^和换行符处的$Match(m)
\s#匹配单个“空白字符”(空格、制表符和换行符)
+#在一次和无限次之间,尽可能多次,根据需要回馈(贪婪)
(?#匹配下面的正则表达式,并将其匹配捕获到名为“event”的backreference中
\d#匹配一个数字0..9
+#在一次和无限次之间,尽可能多次,根据需要回馈(贪婪)
\s#匹配单个“空白字符”(空格、制表符和换行符)
+#在一次和无限次之间,尽可能多次,根据需要回馈(贪婪)
[a-z]#在“a”和“z”之间匹配单个字符
+#在一次和无限次之间,尽可能多次,根据需要回馈(贪婪)
)
\s#匹配单个“空白字符”(空格、制表符和换行符)
+#在一次和无限次之间,尽可能多次,根据需要回馈(贪婪)
(?#匹配下面的正则表达式,并将其匹配捕获到名为“details”的backreference中
[^\r\n]#匹配下表中不存在的单个字符
#回车符
#换行字符
+#在一次和无限次之间,尽可能多次,根据需要回馈(贪婪)
)
$#在行尾(字符串末尾或换行符之前)断言位置
"
希望这能有所帮助。在示例日志中,事件是--“STARTUP”。同样,在相同的模式中还有其他事件。我需要将它们与这些事件匹配,并将它们设置为logEvent。谢谢,但您的代码只识别关键字-“STARTUP”。问题是我有8个类似模式的不同事件。其他示例输入类似于--2009/12/14 11:49:20.94 TAS#####EC05003E 00 ConfgMem TAS配置成员内容这里我需要匹配-“ConfgMem”。您是否需要将整行内容与启动
关键字或同一行前面的内容
或同一行后面的内容
该关键字匹配?上述代码仅适用于最后一种情况。好的。我的想法是---我们有一个空格,所以“\s+”。在该数字之后,顺序是--“\d+”再次后跟空格。然后事件发生,所以“[a-z,a-z]+\\S”后跟空格。所以正则表达式应该字符串regex=“\\S+\\d+\\S+[a-z,a-z]+\\S”;。我必须匹配关键工作—启动、预加载、ConfgEXE等。
@"
(?im) # Match the remainder of the regex with the options: case insensitive (i); ^ and $ match at line breaks (m)
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?<event> # Match the regular expression below and capture its match into backreference with name “event”
\d # Match a single digit 0..9
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
[a-z] # Match a single character in the range between “a” and “z”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?<details> # Match the regular expression below and capture its match into backreference with name “details”
[^\r\n] # Match a single character NOT present in the list below
# A carriage return character
# A line feed character
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
$ # Assert position at the end of a line (at the end of the string or before a line break character)
"