如何使用跨越多行的c#解析来自文本文件的消息？_C#

如何使用跨越多行的c#解析来自文本文件的消息？

如何使用跨越多行的c#解析来自文本文件的消息？,c#,C#,给定此日志文件，如何使用StreamReader读取包含多个新行（\n）的行？ ReadLine方法逐字返回每一行，但消息可能跨越多行这是我到目前为止所拥有的 using (var sr = new StreamReader(filePath)) using (var store = new DocumentStore {ConnectionStringName = "RavenDB"}.Initialize()) { IndexCreation.CreateIndexes(type

给定此日志文件，如何使用

StreamReader

读取包含多个新行（

\n

）的行？

ReadLine

方法逐字返回每一行，但消息可能跨越多行

这是我到目前为止所拥有的

using (var sr = new StreamReader(filePath))
using (var store = new DocumentStore {ConnectionStringName = "RavenDB"}.Initialize())
{
    IndexCreation.CreateIndexes(typeof(Logs_Search).Assembly, store);

    using (var bulkInsert = store.BulkInsert())
    {
        const char columnDelimeter = '|';
        const string quote = @"~";
        string line;

        while ((line = sr.ReadLine()) != null)
        {
            batch++;
            List<string> columns = null;
            try
            {
                columns = line.Split(columnDelimeter)
                                .Select(item => item.Replace(quote, string.Empty))
                                .ToList();

                if (columns.Count != 5)
                {
                    batch--;
                    Log.Error(string.Join(",", columns.ToArray()));
                    continue;
                }

                bulkInsert.Store(LogParser.Log.FromStringList(columns));

                /* Give some feedback */
                if (batch % 100000 == 0)
                {
                    Log.Debug("batch: {0}", batch);
                }

                /* Use sparingly */
                if (ThrottleEnabled && batch % ThrottleBatchSize == 0)
                {
                    Thread.Sleep(ThrottleThreadWait);
                }
            }
            catch (FormatException)
            {
                if (columns != null) Log.Error(string.Join(",", columns.ToArray()));
            }
            catch (Exception exception)
            {
                Log.Error(exception);
            }
        }
    }                   
}

使用（var sr=newstreamreader（filePath））
使用（var store=newdocumentstore{ConnectionStringName=“RavenDB”}.Initialize（））
{
CreateIndexes（typeof（Logs\u Search）.Assembly，store）；
使用（var bulkInsert=store.bulkInsert（））
{
const char columnDelimeter='|'；
常量字符串引号=@“~”；
弦线；
而（（line=sr.ReadLine（））！=null）
{
批处理++；
列表列=null；
尝试
{
columns=行分割（columnDelimeter）
.Select（item=>item.Replace（quote，string.Empty））
.ToList（）；
如果（columns.Count！=5）
{
批次--；
Log.Error（string.Join（“，”，columns.ToArray（））；
继续；
}
bulkInsert.Store（LogParser.Log.FromStringList（columns））；
/*给出一些反馈*/
如果（批次%100000==0）
{
调试（“批处理：{0}”，批处理）；
}
/*节约使用*/
如果（ThrottleEnabled&&batch%ThrottleBatchSize==0）
{
Sleep（ThrottleThreadWait）；
}
}
捕获（格式化异常）
{
if（columns！=null）Log.Error（string.Join（“，”，columns.ToArray（））；
}
捕获（异常）
{
日志错误（异常）；
}
}
}                   
}

模型呢

public class Log
{
    public string Component { get; set; }
    public string DateTime { get; set; }
    public string Logger { get; set; }
    public string Level { get; set; }
    public string ThreadId { get; set; }
    public string Message { get; set; }
    public string Terms { get; set; }

    public static Log FromStringList(List<string> row)
    {
        Log log = new Log();

        /*log.Component = row[0] == string.Empty ? null : row[0];*/
        log.DateTime = row[0] == string.Empty ? null : row[0].ToLower();
        log.Logger = row[1] == string.Empty ? null : row[1].ToLower();
        log.Level = row[2] == string.Empty ? null : row[2].ToLower();
        log.ThreadId = row[3] == string.Empty ? null : row[3].ToLower();
        log.Message = row[4] == string.Empty ? null : row[4].ToLower();

        return log;
    }
}

公共类日志
{
公共字符串组件{get；set；}
公共字符串日期时间{get；set；}
公共字符串记录器{get；set；}
公共字符串级别{get；set；}
公共字符串ThreadId{get；set；}
公共字符串消息{get；set；}
公共字符串项{get；set；}
StringList中的公共静态日志（列表行）
{
日志=新日志（）；
/*log.Component=row[0]==string.Empty？null:行[0]*/
log.DateTime=行[0]==string.Empty？null:行[0].ToLower（）；
log.Logger=行[1]==字符串.Empty？null:行[1]。ToLower（）；
log.Level=行[2]==字符串.Empty？null:行[2]。ToLower（）；
log.ThreadId=第[3]行==字符串.Empty？null:第[3]行.ToLower（）；
log.Message=row[4]==string.Empty？null:row[4].ToLower（）；
返回日志；
}
}

我会在每个错误开始时使用和分解任何与日期模式匹配的文件（例如2013-06-19）。

很难看到您的文件。但我会说，逐行阅读，并附加到某个变量。检查消息的结尾。当您看到它时，对该变量中的消息执行任何您想执行的操作（插入DB等），然后继续读取下一条消息

Pseudo code

read the line
variable a = a +  new line
if end of message
    insert into DB
    reset the variable
continue reading the message.....

如果可以将整个文件读入内存（即

file.ReadAllText

），则可以将其视为单个字符串，并使用正则表达式在日期或类似日期拆分

一个占用更少内存的更通用的解决方案是逐行读取文件。将行追加到缓冲区，直到获得以所需值（在您的情况下，是日期/时间戳）开头的下一行。然后处理该缓冲区。例如：

StringBuilder buffer = new StringBuilder();
foreach (var line in File.ReadLines(logfileName))
{
    if (line.StartsWith("2013-06-19"))
    {
        if (sb.Length > 0)
        {
            ProcessMessage(sb.ToString());
            sb.Clear();
        }
        sb.AppendLine(line);
    }
}
// be sure to process the last message
if (sb.Length > 0)
{
    ProcessMessage(sb.ToString());
}

我忘了发代码了。。对不起，我的眼睛！在否决票到来之前修复图像消息是分开的吗？有经期吗？只需要模式来区分一条消息，然后决定正则表达式是否更合适。我自己也在想类似的事情。似乎是最好的选择。这应该是可行的：

\d{4}-\d{2}-\d{2}\d{2}:\d{2}:\d{2}.\d{4}

工作示例：它被称为捕获括号，有点效果，但是每个日期在数组中都是一个单独的元素。。我想这是我最接近的了。。我会努力的