C# 检查字符串是否包含子字符串列表，并保存匹配的子字符串_C#_String_Contains

C# 检查字符串是否包含子字符串列表，并保存匹配的子字符串

c# string

C# 检查字符串是否包含子字符串列表，并保存匹配的子字符串,c#,string,contains,C#,String,Contains,这是我的情况：我有一个表示文本的字符串 string myText = "Text to analyze for words, bar, foo"; 以及要在其中搜索的单词列表 List<string> words = new List<string> {"foo", "bar", "xyz"}; List words=新列表{“foo”、“bar”、“xyz”}；我想知道最有效的方法，如果存在的话，获取文本中包含的单词列表，类似这样的： List<s

这是我的情况：我有一个表示文本的字符串

string myText = "Text to analyze for words, bar, foo";

以及要在其中搜索的单词列表

List<string> words = new List<string> {"foo", "bar", "xyz"};

List words=新列表{“foo”、“bar”、“xyz”}；

我想知道最有效的方法，如果存在的话，获取文本中包含的单词列表，类似这样的：

List<string> matches = myText.findWords(words)

List matches=myText.findWords（单词）

此查询中没有特殊分析，除非您必须使用

Contains

方法。所以你可以试试这个：

string myText = "Text to analyze for words, bar, foo";

List<string> words = new List<string> { "foo", "bar", "xyz" };

var result = words.Where(i => myText.Contains(i)).ToList();
//result: bar, foo

string myText=“要分析单词、条、foo的文本”；
列表词=新列表{“foo”、“bar”、“xyz”}；
var result=words.Where（i=>myText.Contains（i））.ToList（）；
//结果：巴，福

您可以使用

哈希集

并将两个集合相交：

string myText = "Text to analyze for words, bar, foo"; 
string[] splitWords = myText.Split(' ', ',');

HashSet<string> hashWords = new HashSet<string>(splitWords,
                                                StringComparer.OrdinalIgnoreCase);
HashSet<string> words = new HashSet<string>(new[] { "foo", "bar" },
                                            StringComparer.OrdinalIgnoreCase);

hashWords.IntersectWith(words);

string myText=“要分析单词、条、foo的文本”；
string[]splitWords=myText.Split（“”，’）；
HashSet hashWords=新HashSet（splitWords，
普通木糖酶）；
HashSet words=newhashset（new[]{“foo”，“bar”}，
普通木糖酶）；
hashWords。与（单词）相交；

摆脱了希望能够使用

myText.findWords（words）

的想法，您可以对String类创建一个扩展方法来完成您想要的任务

public static class StringExtentions
{
    public static List<string> findWords(this string str, List<string> words)
    {
        return words.Where(str.Contains).ToList();
    }
}

公共静态类StringExtensions
{
公共静态列表findWords（此字符串str，列表单词）
{
返回单词.Where（str.Contains）.ToList（）；
}
}

用法：

string myText = "Text to analyze for words, bar, foo";
List<string> words = new List<string> { "foo", "bar", "xyz" };
List<string> matches = myText.findWords(words);
Console.WriteLine(String.Join(", ", matches.ToArray()));
Console.ReadLine();

string myText=“要分析单词、条、foo的文本”；
列表词=新列表{“foo”、“bar”、“xyz”}；
列表匹配项=myText.findWords（单词）；
WriteLine（String.Join（“，”，matches.ToArray（））；
Console.ReadLine（）；

结果:

富，酒吧

正则表达式解

var words = new string[]{"Lucy", "play", "soccer"};
var text = "Lucy loves going to the field and play soccer with her friend";
var match = new Regex(String.Join("|",words)).Match(text);
var result = new List<string>();

while (match.Success) {
    result.Add(match.Value);
    match = match.NextMatch();
}

//Result ["Lucy", "play", "soccer"]

var words=新字符串[]{“Lucy”、“play”、“soccer”}；
var text=“露西喜欢和朋友一起去球场踢足球”；
var match=new Regex（String.Join（“|”，words））.match（text）；
var result=新列表（）；
while（匹配成功）{
结果.添加（匹配.值）；
match=match.NextMatch（）；
}
//结果[“露西”、“比赛”、“足球”]

这里有一个简单的解决方案，可以解决空格和标点符号的问题：

static void Main(string[] args)
{
    string sentence = "Text to analyze for words, bar, foo";            
    var words = Regex.Split(sentence, @"\W+");
    var searchWords = new List<string> { "foo", "bar", "xyz" };
    var foundWords = words.Intersect(searchWords);

    foreach (var item in foundWords)
    {
        Console.WriteLine(item);
    }

    Console.ReadLine();
}

static void Main（字符串[]args）
{
string-sense=“要分析单词、条、foo的文本”；
var words=Regex.Split（句子@“\W+”）；
var searchWords=新列表{“foo”、“bar”、“xyz”}；
var foundWords=words.Intersect（searchWords）；
foreach（foundWords中的var项）
{
控制台写入线（项目）；
}
Console.ReadLine（）；
}

在CPU时间或内存方面效率高？myText的大小是多少？您将执行多少次搜索操作/您需要定义“word”的含义。“foo”应该在类似“This is foobar”的字符串中匹配吗？

包含的

答案将与之匹配，而

拆分的

答案则不匹配。考虑到编码时间的效率（这一点不应忽略）@DrewKennedy，只要问题不复杂，为什么不选择最简单、最紧凑的呢solution@HosseinNarimaniRad我同意这没关系。我认为注释指的是这种方法，它需要对字符串进行多次传递，这是不必要的。此外，如果单词相对较长，Boyer-Moore-Horspool算法可以显著加快算法的速度。@Bas谢谢。我将检查Boyer-Moore-Horspool算法。应该注意，这将匹配“this is foobar”之类字符串中的“foo”，这可能是也可能不是期望的结果。虽然这处理给定的示例，但它不适用于“this is foo！Where is bar？I am xyz”之类的字符串。基本上，你需要在任何可以分开单词的事情上分开。此外，OP没有提到“foo”是否应该在类似“This is foobar”的字符串中匹配。@juharr这是OP可以使用

HashSet

执行的一个示例。他可以用他想要的任何分隔符分开。他也可能在将字符串插入集合之前修剪字符串，我还没有这样做。我完全同意，如果OP想要查看单词，而不仅仅是任何子字符串，那么这是一种方法。我只是认为应该指出，拆分过程可能更复杂。您应该使用Regex.Escape