C# 字符串规范化_C#_.net 3.5_String_Normalization

C# 字符串规范化

c# string

C# 字符串规范化,c#,.net-3.5,string,normalization,C#,.net 3.5,String,Normalization,我正在编写一些需要进行字符串规范化的代码，我想将给定的字符串转换为驼峰大小写表示（至少是最好的猜测）。例如： "the quick brown fox" => "TheQuickBrownFox" "the_quick_brown_fox" => "TheQuickBrownFox" "123The_quIck bROWN FOX" => "TheQuickBrownFox" "the_quick brown fox 123" => "TheQuickBrownFox12

我正在编写一些需要进行字符串规范化的代码，我想将给定的字符串转换为驼峰大小写表示（至少是最好的猜测）。例如：

"the quick brown fox" => "TheQuickBrownFox"
"the_quick_brown_fox" => "TheQuickBrownFox"
"123The_quIck bROWN FOX" => "TheQuickBrownFox"
"the_quick brown fox 123" => "TheQuickBrownFox123"
"thequickbrownfox" => "Thequickbrownfox"

我认为你应该能够从这些例子中得到这个想法。我想去掉所有的特殊字符（“，”，！，@等），大写每个单词（单词由空格定义，u或-）和任何前导数字删除（尾随/内部可以，但这个要求并不重要，取决于难度）

我正试图找出实现这一点的最佳方法。我的第一个猜测是使用正则表达式，但我的正则表达式技能充其量也不好，所以我真的不知道从哪里开始

我的另一个想法是循环并解析数据，比如说将数据分解成单词，解析每个单词，然后以这种方式重新构建字符串

或者我还有别的办法吗？

我觉得尝试一下会很有趣，下面是我的想法：

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication2
{
    class Program
    {
        static void Main(string[] args)
        {
            StringBuilder sb = new StringBuilder();
            string sentence = "123The_quIck bROWN FOX1234";

            sentence = sentence.ToLower();

            char[] s = sentence.ToCharArray();

            bool atStart = true;
            char pChar = ' ';

            char[] spaces = { ' ', '_', '-' };
            char a;
            foreach (char c in s)
            {
                if (atStart && char.IsDigit(c)) continue;

                if (char.IsLetter(c))
                {
                    a = c;
                    if (spaces.Contains(pChar))
                        a = char.ToUpper(a);
                    sb.Append(a);
                    atStart = false;
                }
                else if(char.IsDigit(c))
                {
                    sb.Append(c);
                }
                pChar = c;
            }

            Console.WriteLine(sb.ToString());
            Console.ReadLine();
        }
    }
}

我觉得试试会很有趣，下面是我的想法：

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication2
{
    class Program
    {
        static void Main(string[] args)
        {
            StringBuilder sb = new StringBuilder();
            string sentence = "123The_quIck bROWN FOX1234";

            sentence = sentence.ToLower();

            char[] s = sentence.ToCharArray();

            bool atStart = true;
            char pChar = ' ';

            char[] spaces = { ' ', '_', '-' };
            char a;
            foreach (char c in s)
            {
                if (atStart && char.IsDigit(c)) continue;

                if (char.IsLetter(c))
                {
                    a = c;
                    if (spaces.Contains(pChar))
                        a = char.ToUpper(a);
                    sb.Append(a);
                    atStart = false;
                }
                else if(char.IsDigit(c))
                {
                    sb.Append(c);
                }
                pChar = c;
            }

            Console.WriteLine(sb.ToString());
            Console.ReadLine();
        }
    }
}

这个正则表达式匹配所有单词。然后，我们使用一个方法将它们聚合起来，该方法将第一个字符大写，并将字符串的其余部分降低

Regex regex = new Regex(@"[a-zA-Z]*", RegexOptions.Compiled);

private string CamelCase(string str)
{
    return regex.Matches(str).OfType<Match>().Aggregate("", (s, match) => s + CamelWord(match.Value));
}

private string CamelWord(string word)
{
    if (string.IsNullOrEmpty(word))
        return "";

    return char.ToUpper(word[0]) + word.Substring(1).ToLower();
}

Regex Regex=newregex（@“[a-zA-Z]*”，RegexOptions.Compiled）；
私有字符串大小写（字符串str）
{
返回regex.Matches（str.OfType（）.Aggregate（“，（s，match）=>s+CamelWord（match.Value））；
}
专用字符串字（字符串字）
{
if（string.IsNullOrEmpty（word））
返回“”；
返回char.ToUpper（word[0]）+word.Substring（1.ToLower（）；
}

顺便说一句，这个方法忽略了数字。要添加它们，你可以将正则表达式更改为

@“[a-zA-Z]*.[0-9]*”

，我想-但我还没有测试过它。

这个正则表达式匹配所有的单词。然后，我们用一个大写第一个字符的方法将它们聚合起来，并

降低字符串的剩余部分
Regex regex = new Regex(@"[a-zA-Z]*", RegexOptions.Compiled);

private string CamelCase(string str)
{
    return regex.Matches(str).OfType<Match>().Aggregate("", (s, match) => s + CamelWord(match.Value));
}

private string CamelWord(string word)
{
    if (string.IsNullOrEmpty(word))
        return "";

    return char.ToUpper(word[0]) + word.Substring(1).ToLower();
}

Regex Regex=newregex（@“[a-zA-Z]*”，RegexOptions.Compiled）；
私有字符串大小写（字符串str）
{
返回regex.Matches（str.OfType（）.Aggregate（“，（s，match）=>s+CamelWord（match.Value））；
}
专用字符串字（字符串字）
{
if（string.IsNullOrEmpty（word））
返回“”；
返回char.ToUpper（word[0]）+word.Substring（1.ToLower（）；
}

顺便说一句，此方法忽略数字。要添加数字，可以将正则表达式更改为“@[a-zA-Z]*.[0-9]*”
，我想-但我还没有测试过它。在Microsoft.VisualBasic命名空间中使用一个简单的解决方案如何？
（不要忘记添加对Microsoft.VisualBasic的项目引用）：
在Microsoft.VisualBasic命名空间中使用一个简单的解决方案如何？
（不要忘记添加对Microsoft.VisualBasic的项目引用）：
任何涉及匹配特定字符的解决方案都可能无法很好地与某些字符编码配合使用，尤其是在使用Unicode表示法的情况下。Unicode表示法有几十个空格字符、数千个“符号”、数千个标点符号、数千个“字母”等。如果可能，最好使用内置的Unicode感知功能。就什么是“特殊字符”而言，您可以根据它来决定。例如，它包括“标点符号”，但会包括“符号”吗
ToLower（），IsLetter（），etc应该可以，并考虑Unicode中所有可能的字母。匹配破折号和斜杠可能需要考虑Unicode中几十个空格和破折号字符中的一些字符。
任何涉及匹配特定字符的解决方案都可能无法很好地与某些字符编码配合使用，特别是在Unicode代表目前正在使用esentation，它有几十个空格字符、数千个“符号”、数千个标点符号、数千个“字母”等。如果可能的话，最好使用内置的Unicode感知函数。就什么是“特殊字符”而言，您可以根据来决定。例如，它将包括“Punc”但它会包括“符号”吗
ToLower（）、IsLetter（）等应该可以，并考虑Unicode中所有可能的字母。匹配破折号和斜杠可能需要考虑Unicode中几十个空格和破折号字符中的一些字符。
您可以：）
你可以：）
天啊，我想你和我几乎到达了完全相同的地方！天啊，我想你和我几乎到达了完全相同的地方！谢谢，处理其他病人的其他解决方案做得很好谢谢，处理其他病人的其他解决方案做得很好