c#regex将文本放在同一行上

c#regex将文本放在同一行上,c#,regex,string,C#,Regex,String,我知道这可以用一堆for循环来完成,但必须有一个很好的正则表达式方法 我有一个文本文件,行以类别字符串开头,后跟一个“:”和一些其他文本,例如 name:john job:engineer description:engineering is blah blah blah blah blah bla name: sally job:police woman description:catches theives name:..... 如何将所有类别描述放在同一行上 name:john job:

我知道这可以用一堆for循环来完成,但必须有一个很好的正则表达式方法

我有一个文本文件,行以类别字符串开头,后跟一个“:”和一些其他文本,例如

name:john
job:engineer
description:engineering is blah blah blah
blah blah bla
name: sally
job:police woman
description:catches theives
name:.....
如何将所有类别描述放在同一行上

name:john
job:engineer
description:engineering is blah blah blah blah blah bla
name: sally
job:police woman
description:catches theives
name:.....
考虑到我有一个分类对象的数组“姓名、职务、描述”

这是我的代码,它通过循环和检查完成,比我想象的正则表达式慢

private static string congatenateMultiLineHeaderStrings(string output, string[] headersArray)
{
    string[] outputLinesArray = output.Split('\n');
    string outputOneLinePerHeader = "";
    for (int lineNo = 0; lineNo < outputLinesArray.Length; lineNo++) //for each line
    {
        bool hasHeader = false;
        for (int headerNo = 0; headerNo < headersArray.Length; headerNo++) //for each header....
        {
            if (outputLinesArray[lineNo].Contains(headersArray[headerNo])) //if the line contains a header...
            {
                hasHeader = true;
            }
        }
        if (!hasHeader)
        {
            outputOneLinePerHeader += " "+outputLinesArray[lineNo]; //outputLinesArray[lineNo];//attach this line to prev
        }
        else
            outputOneLinePerHeader += "\n" + outputLinesArray[lineNo];
    }
    return outputOneLinePerHeader;
}
私有静态字符串congateMultilineHeaderString(字符串输出,字符串[]HeaderArray)
{
字符串[]outputLinesArray=output.Split('\n');
字符串outputOneLinePerHeader=“”;
对于(int-lineNo=0;lineNo
如果
str
是文本,则可以执行此操作

str = Regex.Replace("\n", "$flag$").Replace(str, @"\$flag\$(?=\w+:)", "\n")
.Replace("$flag$", "");

在本例中,我们通过将所有数据放入一个巨大的文本块来解决跨行解析的问题。。。然后,我们正在对这个巨大的blob进行搜索和替换,以确保每一行都在单词name之前结束:

最后,我们使用正则表达式来获取数据

  List<string> stringList = new List<string>(){
                                        "name:john",
                                        "job:engineer",
                                        "description:engineering is blah blah blah blah blah bla",
                                        "blah blah blah blah blah bla",
                                        "drives a skooter",

                                        "name:Ted", 
                                        "job:engineer",
                                        "description:engineering is blah blah blah blah blah bla",
                                        "blah blah blah blah blah bla",
                                        "has a mustang",

                                        "name:Jim Bob", 
                                        "job:engineer",
                                        "description:engineering is blah blah blah blah blah bla",
                                        "blah blah blah blah blah bla",
                                        "drives a corvette"
                                        };


        StringBuilder sb = new StringBuilder();
        foreach (var mystring in stringList)
        {
            sb.Append(string.Format("{0} ", mystring));
        }


        sb.Replace("name:", "\nname:");

        string pattern = "(?=name)\\s*(?<name>.+)(?=job:)\\s*(?<job>.+)(?=description:)\\s*(?<description>.+)";

        foreach( Match m in Regex.Matches(sb.ToString(),pattern,RegexOptions.IgnoreCase))
        {
            string name = m.Groups["name"].Value.Trim();
            string job = m.Groups["job"].Value.Trim();
            string description = m.Groups["description"].Value.Trim();

        }   
List-stringList=new-List(){
“姓名:约翰”,
“工作:工程师”,
“描述:工程是废话废话废话废话废话”,
“废话废话废话废话废话”,
“开摩托车”,
“姓名:Ted”,
“工作:工程师”,
“描述:工程是废话废话废话废话废话”,
“废话废话废话废话废话”,
“有一辆野马”,
“姓名:吉姆·鲍勃”,
“工作:工程师”,
“描述:工程是废话废话废话废话废话”,
“废话废话废话废话废话”,
“驾驶轻巡洋舰”
};
StringBuilder sb=新的StringBuilder();
foreach(stringList中的var mystring)
{
sb.Append(string.Format(“{0}”,mystring));
}
替换(“姓名:”、“\n姓名:”);
字符串模式=“(?=name)\\s*(?.+)(?=job:)\\s*(?.+)(?=description:)\\s*(?.+)”;
foreach(在Regex.Matches(sb.ToString(),pattern,RegexOptions.IgnoreCase)中匹配m)
{
string name=m.Groups[“name”].Value.Trim();
字符串job=m.Groups[“job”].Value.Trim();
string description=m.Groups[“description”].Value.Trim();
}   
Regex.Replace(所有内容,@“(?m)(?使用此模式

(\r?\n)(?!\w+:)  

用这样的空白代替,这是一种循环方式,虽然它需要大量的cpu

private static string congatenateMultiLineHeaderStrings(string output, string[] headersArray)
{
    string[] outputLinesArray = output.Split('\n');
    string outputOneLinePerHeader = "";
    for (int lineNo = 0; lineNo < outputLinesArray.Length; lineNo++) //for each line
    {
        bool hasHeader = false;
        for (int headerNo = 0; headerNo < headersArray.Length; headerNo++) //for each header....
        {
            if (outputLinesArray[lineNo].Contains(headersArray[headerNo])) //if the line contains a header...
            {
                hasHeader = true;
            }
        }
        if (!hasHeader)
        {
            outputOneLinePerHeader += " "+outputLinesArray[lineNo]; //outputLinesArray[lineNo];//attach this line to prev
        }
        else
            outputOneLinePerHeader += "\n" + outputLinesArray[lineNo];
    }
    return outputOneLinePerHeader;
}
私有静态字符串congateMultilineHeaderString(字符串输出,字符串[]HeaderArray)
{
字符串[]outputLinesArray=output.Split('\n');
字符串outputOneLinePerHeader=“”;
对于(int-lineNo=0;lineNo
好吧,让我们把这个问题归结为真正的问题

问题是,一行文本可以选择性地跨越两行,而在多行场景中,读取该行文本的解析器可能会失败。因此,在其根处有一个两行值。如果在遇到上述场景时,我们可以简单地删除CRLF(回车换行符),
\r\n

我们使用的工具是Regex,但是我们想用
(空格)替换
\r\n
,而不仅仅是匹配文本。因此,通过使用Regex的变体,
Regex.replace
,并给该方法一个模式来匹配
\r\n
,并用空格
替换它,我们可以实现我们的目标

因此,让我们定义我们正在查看的数据,包括单线描述和双线描述,我们将通过修改OPs数据给出关于第二行的视觉提示

string data = @"name:john
job:engineer
description:engineering is 1blah 1blah 1blah
2blah 2blah 2blah
name: sally
job:police woman
description:catches theives
name:OmegaMan
job:Computer Programmer
description:Answers questions
on StackOverflow";
但是…如果下一行确实有一个
字符表示一个新的键值对,我们不想替换它,因此需要以某种方式在模式中跳过这些行

下面的regex replace查找
\r\n
,并通过使用regex
lookaheads
(lookaheads是regex解析器在提交匹配之前进行文本处理时执行的简单元操作(提示)。通过提供提示逻辑,我们可以通过提示将实际匹配指向匹配或不匹配

我补充说
string data = @"name:john
job:engineer
description:engineering is 1blah 1blah 1blah
2blah 2blah 2blah
name: sally
job:police woman
description:catches theives
name:OmegaMan
job:Computer Programmer
description:Answers questions
on StackOverflow";
    string pattern =@"
    (\r\n)      # Find a CRLF and 'match' it (first match $1) to be replaced if....
    (?!         # Stop the match if it *meta* matches next logic 
       (?=.+:)  # Is there a : on the next line?          
     )          # If the look ahead is true the match is stopped and the \r\n is skipped (no match)
    (.)         # But if not we then need to match  at least one character; 
                # and don't replace it. This is the second match as $2."

    // Ignore Pattern Whitespace only allows us to comment the pattern; 
    // it does not affect text processing.
    Console.WriteLine (Regex.Replace(data, 
                                     pattern, 
                                     " $2",    // Replace \r\n with a space and the one matched character.
                                     RegexOptions.IgnorePatternWhitespace));
name:john
job:engineer 
description:engineering is 1blah 1blah 1blah 2blah 2blah 2blah
name: sally
job:police woman
description:catches theives
name:OmegaMan
job:Computer Programmer
description:Answers questions on StackOverflow