显示长字符串（剥离HTML）C#ASP.NET中的前2000个单词_C#

显示长字符串（剥离HTML）C#ASP.NET中的前2000个单词

显示长字符串（剥离HTML）C#ASP.NET中的前2000个单词,c#,C#,我的数据库中有一个字段，用于保存来自html输入的输入。所以我的db列中有数据。我需要的是能够提取这一点，并显示一个简短的版本作为介绍。如果可能的话，甚至可能是第一段通常推荐使用删除HTML的方法。在此之后，只需执行String.Substring即可获得所需的位如果您需要取出前2000个单词，我想您可以使用IndexOf查找一个空格2000次，然后循环遍历它，直到找到要在调用子字符串时使用的索引编辑：添加示例方法 public int GetIndex(string str, int n

我的数据库中有一个字段，用于保存来自html输入的输入。所以我的db列中有数据。我需要的是能够提取这一点，并显示一个简短的版本作为介绍。如果可能的话，甚至可能是第一段

通常推荐使用删除HTML的方法。在此之后，只需执行

String.Substring

即可获得所需的位

如果您需要取出前2000个单词，我想您可以使用

IndexOf

查找一个空格2000次，然后循环遍历它，直到找到要在调用

子字符串时使用的索引
编辑：添加示例方法
public int GetIndex(string str, int numberWanted)
{
    int count = 0;
    int index = 1;
    for (; index < str.Length; index++)
    {
         if (char.IsWhiteSpace(str[index - 1]) == true)
         {
              if (char.IsLetterOrDigit(str[index]) == true ||
                    char.IsPunctuation(str[index]))
              {
                    count++;
                    if (count >= numberWanted)
                         break;
              }
         }
    }
    return index;
}

也许是这样的
    public string Get(string text, int maxWordCount)
    {
        int wordCounter = 0;
        int stringIndex = 0;
        char[] delimiters = new[] { '\n', ' ', ',', '.' };

        while (wordCounter < maxWordCount)
        {
            stringIndex = text.IndexOfAny(delimiters, stringIndex + 1);
            if (stringIndex == -1)
                return text;

            ++wordCounter;
        }

        return text.Substring(0, stringIndex);
    }

编辑：
剥离HTML的一种非常简单的方法：
return Regex.Replace(text, @”<(.|\n)*?>”, string.Empty);

返回Regex.Replace（text，@“”，string.Empty）；
一旦你有了字符串，你就必须数一数你的单词。我假设空格是单词的分隔符，因此下面的代码应该找到字符串中的前2000个单词（如果单词较少，则将其打断）
string myString=“la la”；
int lastPosition=0；
对于（int i=0；i<2000；i++）
{
int position=myString.IndexOf（“”，lastPosition+1）；
如果（位置==-1）中断；
最后位置=位置；
}
string first千字=myString.Substring（0，lastPosition）；

您可以将indexOf
更改为indexOfAny
，以支持更多字符作为分隔符。
我遇到了同样的问题，并将一些堆栈溢出答案组合到这个类中。它使用HtmlAgilityPack，这是一个更好的工作工具。电话：
 Words(string html, int n)

得到n个单词
using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;


namespace UmbracoUtilities
{
    public class Text
    {
      /// <summary>
      /// Return the first n words in the html
      /// </summary>
      /// <param name="html"></param>
      /// <param name="n"></param>
      /// <returns></returns>
      public static string Words(string html, int n)
      {
        string words = html, n_words;

        words = StripHtml(html);
        n_words = GetNWords(words, n);

        return n_words;
      }


      /// <summary>
      /// Returns the first n words in text
      /// Assumes text is not a html string
      /// http://stackoverflow.com/questions/13368345/get-first-250-words-of-a-string
      /// </summary>
      /// <param name="text"></param>
      /// <param name="n"></param>
      /// <returns></returns>
      public static string GetNWords(string text, int n)
      {
        StringBuilder builder = new StringBuilder();

        //remove multiple spaces
        //http://stackoverflow.com/questions/1279859/how-to-replace-multiple-white-spaces-with-one-white-space
        string cleanedString = System.Text.RegularExpressions.Regex.Replace(text, @"\s+", " ");
        IEnumerable<string> words = cleanedString.Split().Take(n + 1);

        foreach (string word in words)
          builder.Append(" " + word);

        return builder.ToString();
      }


      /// <summary>
      /// Returns a string of html with tags removed
      /// </summary>
      /// <param name="html"></param>
      /// <returns></returns>
      public static string StripHtml(string html)
      {
        HtmlDocument document = new HtmlDocument();
        document.LoadHtml(html);

        var root = document.DocumentNode;
        var stringBuilder = new StringBuilder();

        foreach (var node in root.DescendantsAndSelf())
        {
          if (!node.HasChildNodes)
          {
            string text = node.InnerText;
            if (!string.IsNullOrEmpty(text))
              stringBuilder.Append(" " + text.Trim());
          }
        }

        return stringBuilder.ToString();
      }



    }
}

使用HtmlAgilityPack；
使用制度；
使用System.Collections.Generic；
使用System.Linq；
使用系统文本；
使用System.Threading.Tasks；
命名空间实用程序
{
公共类文本
{
/// 
///返回html中的前n个单词
/// 
/// 
/// 
/// 
公共静态字符串字（字符串html，int n）
{
字符串字=html，n_字；
words=StripHtml（html）；
n_words=GetNWords（words，n）；
返回n_单词；
}
/// 
///返回文本中的前n个单词
///假定文本不是html字符串
/// http://stackoverflow.com/questions/13368345/get-first-250-words-of-a-string
/// 
/// 
/// 
/// 
公共静态字符串GetNWords（字符串文本，int-n）
{
StringBuilder=新的StringBuilder（）；
//删除多个空格
//http://stackoverflow.com/questions/1279859/how-to-replace-multiple-white-spaces-with-one-white-space
string cleanedString=System.Text.RegularExpressions.Regex.Replace（Text，@“\s+”，“”）；
IEnumerable words=cleanedString.Split（）.Take（n+1）；
foreach（单词中的字符串）
builder.Append（““+word”）；
返回builder.ToString（）；
}
/// 
///返回已删除标记的html字符串
/// 
/// 
/// 
公共静态字符串StripHtml（字符串html）
{
HtmlDocument document=新的HtmlDocument（）；
document.LoadHtml（html）；
var root=document.DocumentNode；
var stringBuilder=新的stringBuilder（）；
foreach（root.genderantsandself（）中的var节点）
{
如果（！node.HasChildNodes）
{
字符串文本=node.InnerText；
如果（！string.IsNullOrEmpty（text））
stringBuilder.Append（“+text.Trim（））；
}
}
返回stringBuilder.ToString（）；
}
}
}

圣诞快乐 这是如何删除HTML标记的？我正在使用HTML Agility pack，它确实删除了所有HTML，现在我只需要一个代码示例来循环遍历字符串并获得前2000个单词。@Kenyana:添加了一个示例方法和一个如何调用它的示例。不确定它是否非常有效，可能无法完全正确计算，但至少应该给您一个想法。这是我的示例代码！我把它放在一个类中，这个类似乎剥离并添加回html元素以显示在页面上。但这并不局限于我想说的话。NB：我从这个网站的另一个帖子上得到了这个代码。我该如何在这个页面上发布代码示例？@Kenyana:这并不奇怪，我想如果你让很多人都这么做，很多人会想出非常类似的代码。只需将代码作为文本发布，但在每行前面加上4个空格。编辑器中有一个按钮，如果您先选择所有文本，它将为您执行此操作。
string myString = "la la la";
int lastPosition = 0;
for (int i = 0; i < 2000; i++)
{
    int position = myString.IndexOf(' ', lastPosition + 1);
    if (position == -1) break;
    lastPosition = position;
}
string firstThousandWords = myString.Substring(0, lastPosition);

 Words(string html, int n)

using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;


namespace UmbracoUtilities
{
    public class Text
    {
      /// <summary>
      /// Return the first n words in the html
      /// </summary>
      /// <param name="html"></param>
      /// <param name="n"></param>
      /// <returns></returns>
      public static string Words(string html, int n)
      {
        string words = html, n_words;

        words = StripHtml(html);
        n_words = GetNWords(words, n);

        return n_words;
      }


      /// <summary>
      /// Returns the first n words in text
      /// Assumes text is not a html string
      /// http://stackoverflow.com/questions/13368345/get-first-250-words-of-a-string
      /// </summary>
      /// <param name="text"></param>
      /// <param name="n"></param>
      /// <returns></returns>
      public static string GetNWords(string text, int n)
      {
        StringBuilder builder = new StringBuilder();

        //remove multiple spaces
        //http://stackoverflow.com/questions/1279859/how-to-replace-multiple-white-spaces-with-one-white-space
        string cleanedString = System.Text.RegularExpressions.Regex.Replace(text, @"\s+", " ");
        IEnumerable<string> words = cleanedString.Split().Take(n + 1);

        foreach (string word in words)
          builder.Append(" " + word);

        return builder.ToString();
      }


      /// <summary>
      /// Returns a string of html with tags removed
      /// </summary>
      /// <param name="html"></param>
      /// <returns></returns>
      public static string StripHtml(string html)
      {
        HtmlDocument document = new HtmlDocument();
        document.LoadHtml(html);

        var root = document.DocumentNode;
        var stringBuilder = new StringBuilder();

        foreach (var node in root.DescendantsAndSelf())
        {
          if (!node.HasChildNodes)
          {
            string text = node.InnerText;
            if (!string.IsNullOrEmpty(text))
              stringBuilder.Append(" " + text.Trim());
          }
        }

        return stringBuilder.ToString();
      }



    }
}