C# 使用搜索字符串提取段落

C# 使用搜索字符串提取段落,c#,asp.net,search-engine,C#,Asp.net,Search Engine,我使用下面的代码来提取匹配字符串的段落 int charBeforeAndAfter = 100; string matchParagraphs = string.Empty; Regex wordMatch = new Regex(@"\b" + word + @"\b", RegexOptions.IgnoreCase); foreach (string paragraph in text.Split(n

我使用下面的代码来提取匹配字符串的段落

int charBeforeAndAfter = 100;
        string matchParagraphs = string.Empty;
                        Regex wordMatch = new Regex(@"\b" + word + @"\b", RegexOptions.IgnoreCase);
            foreach (string paragraph in text.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries))
            {
                int startIdx = -1;
                int length = -1;
                foreach (Match match in wordMatch.Matches(paragraph))
                {
                    int wordIdx = match.Index;
                    if (wordIdx >= startIdx && wordIdx <= startIdx + length)
                        continue;
                    startIdx = wordIdx > charBeforeAndAfter ? wordIdx - charBeforeAndAfter : 0;
                    length = wordIdx + match.Length + charBeforeAndAfter < paragraph.Length ? match.Length + charBeforeAndAfter : paragraph.Length - startIdx;
                    string extract = wordMatch.Replace(paragraph.Substring(startIdx, length), "<b>" + match.Value + "</b>");
                    matchParagraphs = "..." + extract + "...";
                    return matchParagraphs;
                }
            }   
int charbeforeandfter=100;
字符串匹配段落=string.Empty;
Regex wordMatch=new Regex(@“\b”+word+@“\b”,RegexOptions.IgnoreCase);
foreach(text.Split中的字符串段落(新[]{'\r','\n'},StringSplitOptions.RemoveEmptyEntries))
{
int startIdx=-1;
整数长度=-1;
foreach(匹配wordMatch.Matches(段落))
{
int-wordIdx=match.Index;
如果(wordIdx>=startIdx&&wordIdx-charBeforeAndAfter?wordIdx-charBeforeAndAfter:0;
length=wordIdx+match.length+charBeforeAndAfter
我得到的结果是正确的,但在开头和结尾的段落中,我得到的是易断单词,如“”…ing regions使用and布尔连接符指定区域,因此narr…”

如何避免说脏话请帮帮我


提前感谢…

您可以尝试以下内容:

using System;
using System.Text.RegularExpressions;

static class Program {

    static void Main(params string[] args) {

        string text = @"Lorem ipsum dolor sit amet, consectetur adipisicing 
elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim 
ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea 
commodo consequat.";

        ExtractParagraph(text, "magna");
        ExtractParagraph(text, "ipsum");
        ExtractParagraph(text, "ut");

    }

    static void ExtractParagraph(string text, string word) {
        Console.WriteLine("Matches for: {0}", word);
        string expression = @"((^.{0,30}|\w*.{30})\b" + word + @"\b(.{30}\w*|.{0,30}$))";
        Regex wordMatch = new Regex(expression, RegexOptions.IgnoreCase | RegexOptions.Singleline);
        foreach (Match m in wordMatch.Matches(text)) {
            Console.WriteLine("  {0}", m.Value);
        }
    }

}
基本思想是匹配单词周围的额外内容:
*{30}\bword\b.*{30}
,然后添加一些“单词字符”,而不是将单词一分为二:
\w*{30}\bword\b.*{30}\w*

^.{0,30}
{0,30}$
这样的片段要匹配,即使句子开头或结尾的字符少于30个


与正则表达式一样,这不太可能赢得可读性竞赛,但似乎有效…

我也遇到了类似的问题。我使用以下方法解决了这个问题:

int len = 50;
int length = 50;    
while (text.substring(0, length).length == length)

{
    if (text.substring(0, length).endsWith(" "))
    {
            var out = 'what you want to output'
            break
    }
    else
    {
            length--;
            if (length < 10) break;
    }
}
return out;
int len=50;
整数长度=50;
while(text.substring(0,长度).length==长度)
{
if(text.substring(0,长度).endsWith(“”)
{
var out='您想要输出的内容'
打破
}
其他的
{
长度--;
如果(长度<10)断裂;
}
}
返回;

这不是最好的解决方案,但它能很好地满足我的需要。基本上,它只是运行我的代码,检查它是否少于50个字符。然后打印任何少于50个字符的内容,最后一个字符是空格。

你能描述一下你想要实现什么吗?这里布尔值是搜索关键字,所以我在关键字wor前后读取30个字符d布尔..输出:-“…ing区域使用and布尔连接器指定区域,因此narr…”这里我的单词在读取30个字符时被打断,这就是问题所在。