C# 正则表达式聊天消息检测
我目前正试图开发一个软件来正确查看以.txt格式保存的WhatsApp消息(通过电子邮件发送),并试图制作一个解析器。 在过去的3个小时里,我一直在尝试使用Regex,但没有找到解决方案,因为我以前几乎没有使用过Regex 消息如下所示:C# 正则表达式聊天消息检测,c#,regex,C#,Regex,我目前正试图开发一个软件来正确查看以.txt格式保存的WhatsApp消息(通过电子邮件发送),并试图制作一个解析器。 在过去的3个小时里,我一直在尝试使用Regex,但没有找到解决方案,因为我以前几乎没有使用过Regex 消息如下所示: 16.08.2015, 18:30 - Person 1: Some multiline text here still in the message 16.08.2015, 18:31 - Person 2: some other message which
16.08.2015, 18:30 - Person 1: Some multiline text here
still in the message
16.08.2015, 18:31 - Person 2: some other message which could be multiline
16.08.2015, 18:33 - Person 1: once again
我正试图通过匹配正则表达式来正确地拆分它们
(像这样)
我一直在尝试使用非常混乱的正则表达式,它看起来像
\d\d\\.\d\d\\。[…]
我不会为此使用一个正则表达式。相反,我只会使用StreamReader
或StreamReader
;您必须检查当前处理行是否为“chat start”行(使用正则表达式),如果是,则检查以下任何一行是否为“chat start”行,并跟踪是否应追加或生成新行。我编写了一个快速扩展方法来演示这一点:
public static class ChatReader
{
static string pattern = @"\d\d\.\d\d\.\d\d\d\d, \d\d:\d\d - .*?:";
static Regex rgx = new Regex(pattern);
static string prevLine = "";
static string currLine = "";
public static IEnumerable<string> ReadChatMessages(this TextReader reader)
{
prevLine = reader.ReadLine();
currLine = reader.ReadLine();
bool isPrevChatMsg = rgx.IsMatch(prevLine);
while (currLine != null)
{
bool isCurrChatMsg = rgx.IsMatch(currLine);
if (isPrevChatMsg && isCurrChatMsg)
{
yield return prevLine;
prevLine = currLine;
}
else if (isCurrChatMsg)
{
yield return currLine;
prevLine = currLine;
}
else
{
prevLine += '\n' + currLine;
}
currLine = reader.ReadLine();
}
yield return prevLine;
}
}
公共静态类聊天阅读器
{
静态字符串模式=@“\d\d\。\d\d\。\d\d\d\d\d\d:\d\d-.*?:”;
静态正则表达式rgx=新正则表达式(模式);
静态字符串prevLine=“”;
静态字符串currLine=“”;
公共静态IEnumerable ReadChatMessages(此文本阅读器)
{
prevLine=reader.ReadLine();
currLine=reader.ReadLine();
bool isPrevChatMsg=rgx.IsMatch(prevLine);
while(currLine!=null)
{
bool isCurrChatMsg=rgx.IsMatch(currLine);
if(isPrevChatMsg&&isCurrChatMsg)
{
收益率线;
prevLine=currLine;
}
else if(isCurrChatMsg)
{
收益率回归线;
prevLine=currLine;
}
其他的
{
prevLine+='\n'+currLine;
}
currLine=reader.ReadLine();
}
收益率线;
}
}
可以像这样使用:
List<string> chatMessages = reader.ReadChatMessages().ToList();
List chatMessages=reader.ReadChatMessages().ToList();
您的正则表达式是什么?请发一张。你只需要提取<代码> 16.082015,1830,<代码> 16.082015,18:31 ,<代码> 16.082015,18:33 ?请编辑你的问题,因为你不清楚你希望如何解析你的消息和你被困在哪里。您得到的输出有什么问题?你想要什么样的输出?哎呀,是你想要消息看起来像你的最后一个盒子,还是它们看起来像那样,而不是你想要的?
public static class ChatReader
{
static string pattern = @"\d\d\.\d\d\.\d\d\d\d, \d\d:\d\d - .*?:";
static Regex rgx = new Regex(pattern);
static string prevLine = "";
static string currLine = "";
public static IEnumerable<string> ReadChatMessages(this TextReader reader)
{
prevLine = reader.ReadLine();
currLine = reader.ReadLine();
bool isPrevChatMsg = rgx.IsMatch(prevLine);
while (currLine != null)
{
bool isCurrChatMsg = rgx.IsMatch(currLine);
if (isPrevChatMsg && isCurrChatMsg)
{
yield return prevLine;
prevLine = currLine;
}
else if (isCurrChatMsg)
{
yield return currLine;
prevLine = currLine;
}
else
{
prevLine += '\n' + currLine;
}
currLine = reader.ReadLine();
}
yield return prevLine;
}
}
List<string> chatMessages = reader.ReadChatMessages().ToList();