用c#和regex解析日志文件
我有一个大的日志文件,看起来像下面的3行示例用c#和regex解析日志文件,c#,regex,c#-4.0,.net-4.0,C#,Regex,C# 4.0,.net 4.0,我有一个大的日志文件,看起来像下面的3行示例 \LogFiles\W3SVC1\u_ex12.log:32:2015-01-04 07:11:22 &actor=%7B%22name%22%3A%5B%22Smith%2C%20Steve%22%5D%2C%22mbox%22%3A%5B%22mailto%3ASmith.Steve%40xyz.com%22%5D%7D& \LogFiles\W3SVC1\u_ex12.log:32:2015-06-08 02:04:13 &am
\LogFiles\W3SVC1\u_ex12.log:32:2015-01-04 07:11:22 &actor=%7B%22name%22%3A%5B%22Smith%2C%20Steve%22%5D%2C%22mbox%22%3A%5B%22mailto%3ASmith.Steve%40xyz.com%22%5D%7D&
\LogFiles\W3SVC1\u_ex12.log:32:2015-06-08 02:04:13 &actor=%7B%22name%22%3A%5B%22Brown%2C%20Bob%22%5D%2C%22mbox%22%3A%5B%22mailto%3ABrown.Bob%40xyz.com%22%5D%7D&
\LogFiles\W3SVC1\u_ex12.log:32:2014-08-02 05:50:37 &actor=%7B%22name%22%3A%5B%22Franklin%2C%20Francis%22%5D%2C%22mbox%22%3A%5B%22mailto%3AFranklin.Francis%40xyz.com%22%5D%7D&
我需要提取日志文件中隐藏的日期、名称和mailto字段
我试着使用一个在线正则表达式生成器,但在它变得笨拙之前,我只做了这么多
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
//test string
string txt="\LogFiles\W3SVC1\u_ex12.log:32:2014-08-02 05:50:37 &actor=%7B%22name%22%3A%5B%22Franklin%2C%20Francis%22%5D%2C%22mbox%22%3A%5B%22mailto%3AFranklin.Francis%40xyz.com%22%5D%7D&";
string re1=".*?"; // Non-greedy match on filler
string re2="((?:(?:[1]{1}\\d{1}\\d{1}\\d{1})|(?:[2]{1}\\d{3}))[-:\\/.](?:[0]?[1-9]|[1][012])[-:\\/.](?:(?:[0-2]?\\d{1})|(?:[3][01]{1})))(?![\\d])"; // YYYYMMDD 1
Regex r = new Regex(re1+re2,RegexOptions.IgnoreCase|RegexOptions.Singleline);
Match m = r.Match(txt);
if (m.Success)
{
String yyyymmdd1=m.Groups[1].ToString();
Console.Write("("+yyyymmdd1.ToString()+")"+"\n");
}
Console.ReadLine();
}
}
}
有没有办法在c#中使用正则表达式或不使用正则表达式来实现这一点
谢谢 您可以将该行拆分为几个部分,然后解码url部分,获取actor参数,将其反序列化为
actor
并使用其属性。一个简单的例子是:
string txt = @"\LogFiles\W3SVC1\u_ex12.log:32:2014-08-02 05:50:37 &actor=%7B%22name%22%3A%5B%22Franklin%2C%20Francis%22%5D%2C%22mbox%22%3A%5B%22mailto%3AFranklin.Francis%40xyz.com%22%5D%7D&";
var parts = txt.Split(' ');
var urlParams = HttpUtility.UrlDecode(parts[2]);
string actorJson = HttpUtility.ParseQueryString(urlParams).Get("actor");
Actor actor = JsonConvert.DeserializeObject<Actor>(actorJson);
Console.WriteLine(actor.Name + " " + actor.EmailAddress);
现在,您只需使用File类获取所有行,并循环遍历其中的每一行,然后将所有反序列化的参与者放入列表或类似的内容中。假设您使用正则表达式,并且它采用这种通用的行形式,类似的方法应该可以工作- (5.m)目前的几几家公司(以下)的以下代码:::,(5.m)m)m)m)目前的以下以下以下以下以下的代码::::,(,(5.m)m)m)m)m)m)m)m)m)m)目前的以下以下以下以下以下的::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::((((::::::::::::::::::[0-0-90-9a-9a-9a-9a-9a-9a-9a-9a-9a-9a-9a-fA-fA-fA-F-F-F-F-F-F-F-F-F-F-*mbox(?:%[0-9a-fA-F]{2})+mailto(?:%[0-9a-fA-F]{2})+(?(?:(?!(?!%[0-9a-fA-F]{2})+)(?:%[0-9a-fA-F]{2})+(?(?:(?!%[0-9a-fA F]{2})+)(?:[0-9a-fA F]{2} )+& 它使用正则表达式中修饰符组中的多行修饰符
(?m)
格式:
(?m)
^
\S+
:
(?<Date> #_(1 start)
\d+
-
\d+
-
\d+
) #_(1 end)
\s
(?:
(?! &actor= )
.
)+
&actor=
(?: % [0-9a-fA-F]{2} )*
name
(?: % [0-9a-fA-F]{2} )*
(?<LastName> #_(2 start)
(?:
(?! % [0-9a-fA-F]{2} | mbox )
.
)+
) #_(2 end)
(?: % [0-9a-fA-F]{2} )+
(?<FirstName> #_(3 start)
(?:
(?! % [0-9a-fA-F]{2} | mbox )
.
)*
) #_(3 end)
(?: % [0-9a-fA-F]{2} )*
mbox
(?: % [0-9a-fA-F]{2} )+
mailto
(?: % [0-9a-fA-F]{2} )+
(?<MailUser> #_(4 start)
(?:
(?! % [0-9a-fA-F]{2} )
.
)+
) #_(4 end)
(?: % [0-9a-fA-F]{2} )+
(?<MailDomain> #_(5 start)
(?:
(?! % [0-9a-fA-F]{2} )
.
)+
) #_(5 end)
(?: % [0-9a-fA-F]{2} )+
&
另外,只需稍加修改,就可以将它们全部放入CaptureCollection列表中在一场比赛中 C#
你试过了吗?解码字符串后,提取会更容易。你的“大日志文件”有多大?50MB?10GB?500GB?有多少行文本?@JamesBlond最大的一行是18MB什么工具或库记录了这个?试着获取它的解析器。我肯定有一个。试着发送。这是一个示例片段。它允许你进行语言集成查询(LINQ)直接在原始事件源上。只需逐行调试它,您就会看到。它只需要日志行的url参数部分,然后解码url(删除所有%字符等),然后选择actor参数值(即json),将json值反序列化到Actor类的实例中,就是这样。不需要正则表达式。愚蠢的问题,但我在控制台应用程序中使用.net 4.6,无法访问System.Web。您遇到过这个问题吗?@JamesBlondAs上面提到的,您需要添加对
System.Web
的引用,然后使用System.Web放入
在文件顶部添加NuGet包Json.NET
,并将其添加到您的项目中,以及对它的使用。我进入了引用>添加引用,选择了System.Web,但它从未出现在列表中。我使用的是VS2015。我将在2013年尝试。谢谢!我昨天在VS2015中完成了此操作,没有问题。但它可能是.NET 4.52,没有问题t 4.6,我不记得了。
(?m)
^
\S+
:
(?<Date> #_(1 start)
\d+
-
\d+
-
\d+
) #_(1 end)
\s
(?:
(?! &actor= )
.
)+
&actor=
(?: % [0-9a-fA-F]{2} )*
name
(?: % [0-9a-fA-F]{2} )*
(?<LastName> #_(2 start)
(?:
(?! % [0-9a-fA-F]{2} | mbox )
.
)+
) #_(2 end)
(?: % [0-9a-fA-F]{2} )+
(?<FirstName> #_(3 start)
(?:
(?! % [0-9a-fA-F]{2} | mbox )
.
)*
) #_(3 end)
(?: % [0-9a-fA-F]{2} )*
mbox
(?: % [0-9a-fA-F]{2} )+
mailto
(?: % [0-9a-fA-F]{2} )+
(?<MailUser> #_(4 start)
(?:
(?! % [0-9a-fA-F]{2} )
.
)+
) #_(4 end)
(?: % [0-9a-fA-F]{2} )+
(?<MailDomain> #_(5 start)
(?:
(?! % [0-9a-fA-F]{2} )
.
)+
) #_(5 end)
(?: % [0-9a-fA-F]{2} )+
&
** Grp 1 [Date] - ( pos 31 , len 10 )
2015-01-04
** Grp 2 [LastName] - ( pos 80 , len 5 )
Smith
** Grp 3 [FirstName] - ( pos 91 , len 5 )
Steve
** Grp 4 [MailUser] - ( pos 133 , len 11 )
Smith.Steve
** Grp 5 [MailDomain] - ( pos 147 , len 7 )
xyz.com
---------------------
** Grp 1 [Date] - ( pos 197 , len 10 )
2015-06-08
** Grp 2 [LastName] - ( pos 246 , len 5 )
Brown
** Grp 3 [FirstName] - ( pos 257 , len 3 )
Bob
** Grp 4 [MailUser] - ( pos 297 , len 9 )
Brown.Bob
** Grp 5 [MailDomain] - ( pos 309 , len 7 )
xyz.com
----------------------
** Grp 1 [Date] - ( pos 359 , len 10 )
2014-08-02
** Grp 2 [LastName] - ( pos 408 , len 8 )
Franklin
** Grp 3 [FirstName] - ( pos 422 , len 7 )
Francis
** Grp 4 [MailUser] - ( pos 466 , len 16 )
Franklin.Francis
** Grp 5 [MailDomain] - ( pos 485 , len 7 )
xyz.com
string log =
@"
\LogFiles\W3SVC1\u_ex12.log:32:2015-01-04 07:11:22 &actor=%7B%22name%22%3A%5B%22Smith%2C%20Steve%22%5D%2C%22mbox%22%3A%5B%22mailto%3ASmith.Steve%40xyz.com%22%5D%7D&
\LogFiles\W3SVC1\u_ex12.log:32:2015-06-08 02:04:13 &actor=%7B%22name%22%3A%5B%22Brown%2C%20Bob%22%5D%2C%22mbox%22%3A%5B%22mailto%3ABrown.Bob%40xyz.com%22%5D%7D&
\LogFiles\W3SVC1\u_ex12.log:32:2014-08-02 05:50:37 &actor=%7B%22name%22%3A%5B%22Franklin%2C%20Francis%22%5D%2C%22mbox%22%3A%5B%22mailto%3AFranklin.Francis%40xyz.com%22%5D%7D&
sfgbadfbdfbadfbdab
junk .........
\LogFiles\W3SVC1\u_ex12.log:32:2014-08-02 05:50:37 &actor=%7B%22name%22%3A%5B%22Smith%2C%20Joe%22%5D%2C%22mbox%22%3A%5B%22mailto%3ASmith.Joe%40xyz.com%22%5D%7D&
\LogFiles\W3SVC1\u_ex12.log:32:2014-08-02 05:50:37 &actor=%7B%22name%22%3A%5B%22Doe%2C%20Jane%22%5D%2C%22mbox%22%3A%5B%22mailto%3ADoe.Jane%40xyz.com%22%5D%7D&
";
Regex RxLog = new Regex(@"(?m)(?:^\S+:(?<Date>\d+-\d+-\d+)\s(?:(?!&actor=).)+&actor=(?:%[0-9a-fA-F]{2})*name(?:%[0-9a-fA-F]{2})*(?<LastName>(?:(?!%[0-9a-fA-F]{2}|mbox).)+)(?:%[0-9a-fA-F]{2})+(?<FirstName>(?:(?!%[0-9a-fA-F]{2}|mbox).)*)(?:%[0-9a-fA-F]{2})*mbox(?:%[0-9a-fA-F]{2})+mailto(?:%[0-9a-fA-F]{2})+(?<MailUser>(?:(?!%[0-9a-fA-F]{2}).)+)(?:%[0-9a-fA-F]{2})+(?<MailDomain>(?:(?!%[0-9a-fA-F]{2}).)+)(?:%[0-9a-fA-F]{2})+&\s*|(?:.*\s))+");
Match logMatch = RxLog.Match(log);
if (logMatch.Success)
{
CaptureCollection ccDate = logMatch.Groups["Date"].Captures;
CaptureCollection ccLname = logMatch.Groups["LastName"].Captures;
CaptureCollection ccFname = logMatch.Groups["FirstName"].Captures;
CaptureCollection ccUser = logMatch.Groups["MailUser"].Captures;
CaptureCollection ccDomain = logMatch.Groups["MailDomain"].Captures;
for (int i = 0; i < ccDate.Count; i++)
Console.WriteLine("{0} {1}, {2} {3}@{4}", ccDate[i].Value, ccLname[i].Value, ccFname[i].Value, ccUser[i].Value, ccDomain[i].Value );
}
2015-01-04 Smith, Steve Smith.Steve@xyz.com
2015-06-08 Brown, Bob Brown.Bob@xyz.com
2014-08-02 Franklin, Francis Franklin.Francis@xyz.com
2014-08-02 Smith, Joe Smith.Joe@xyz.com
2014-08-02 Doe, Jane Doe.Jane@xyz.com