C# 在C语言中将HTML实体转换为Unicode字符#_C#_Windows Runtime_Html Entities_Html Encode

C# 在C语言中将HTML实体转换为Unicode字符#

c# windows-runtime

C# 在C语言中将HTML实体转换为Unicode字符#,c#,windows-runtime,html-entities,html-encode,C#,Windows Runtime,Html Entities,Html Encode,我在Python和Javascript中找到了类似的问题和答案，但在C#或任何其他WinRT兼容语言中没有找到我之所以需要它，是因为我在Windows8商店应用程序中显示从网站获取的文本。例如，é应该变成é 还是有更好的办法？我没有显示网站或rss源，只显示网站及其标题的列表。使用HttpUtility.HtmlDecode（）。请阅读msdn 我建议使用和而不是HttpUtility.HtmlDecode 这是因为Winforms/WPF/Console应用程序中不存在Syste

我在Python和Javascript中找到了类似的问题和答案，但在C#或任何其他WinRT兼容语言中没有找到

我之所以需要它，是因为我在Windows8商店应用程序中显示从网站获取的文本。例如，

é应该变成é

还是有更好的办法？我没有显示网站或rss源，只显示网站及其标题的列表。
使用HttpUtility.HtmlDecode（）
。请阅读msdn
我建议使用和而不是HttpUtility.HtmlDecode

这是因为Winforms/WPF/Console应用程序中不存在System.Web
引用，您可以使用该类获得完全相同的结果（该类已作为所有这些项目中的引用添加）
用法：
string s =  System.Net.WebUtility.HtmlDecode("&eacute;"); // Returns é

Metro应用程序和WP8应用程序中HTML实体和HTML编号的不同编码/编码
使用Windows Runtime Metro应用程序
使用Windows Phone 8.0
为了解决这个问题，在WP8中，我在调用System.Net.WebUtility.HtmlDecode（）
之前实现了中的表。这可能很有用，可以用unicode等价物替换所有（就我的需求而言）实体
    public string EntityToUnicode(string html) {
        var replacements = new Dictionary<string, string>();
        var regex = new Regex("(&[a-z]{2,5};)");
        foreach (Match match in regex.Matches(html)) {
            if (!replacements.ContainsKey(match.Value)) { 
                var unicode = HttpUtility.HtmlDecode(match.Value);
                if (unicode.Length == 1) {
                    replacements.Add(match.Value, string.Concat("&#", Convert.ToInt32(unicode[0]), ";"));
                }
            }
        }
        foreach (var replacement in replacements) {
            html = html.Replace(replacement.Key, replacement.Value);
        }
        return html;
    }

公共字符串EntityToUnicode（字符串html）{
var replacements=newdictionary（）；
var regex=newregex（&[a-z]{2,5}；）”；
foreach（regex.Matches（html）中的匹配）{
如果（！replacements.ContainsKey（match.Value））{
var unicode=HttpUtility.HtmlDecode（match.Value）；
if（unicode.Length==1）{
Add（match.Value，string.Concat（&#“），Convert.ToInt32（unicode[0]），“；”）；
}
}
}
foreach（替换中的var替换）{
html=html.Replace（replacement.Key，replacement.Value）；
}
返回html；
}
这对我来说很有效，它取代了普通实体和unicode实体
private static readonly Regex HtmlEntityRegex = new Regex("&(#)?([a-zA-Z0-9]*);");

public static string HtmlDecode(this string html)
{
    if (html.IsNullOrEmpty()) return html;
    return HtmlEntityRegex.Replace(html, x => x.Groups[1].Value == "#"
        ? ((char)int.Parse(x.Groups[2].Value)).ToString()
        : HttpUtility.HtmlDecode(x.Groups[0].Value));
}

[Test]
[TestCase(null, null)]
[TestCase("", "")]
[TestCase("&#39;fark&#39;", "'fark'")]
[TestCase("&quot;fark&quot;", "\"fark\"")]
public void should_remove_html_entities(string html, string expected)
{
    html.HtmlDecode().ShouldEqual(expected);
}

改进的Zumey方法（我不能在那里发表评论）。最大字符大小在实体中：&感叹号；(11). 实体中的大写字母也是可能的，例如À（来源于）
公共字符串EntityToUnicode（字符串html）{
var replacements=newdictionary（）；
var regex=newregex（&[a-zA-Z]{2,11}；）”；
foreach（regex.Matches（html）中的匹配）{
如果（！replacements.ContainsKey（match.Value））{
var unicode=HttpUtility.HtmlDecode（match.Value）；
if（unicode.Length==1）{
Add（match.Value，string.Concat（&#“），Convert.ToInt32（unicode[0]），“；”）；
}
}
}
foreach（替换中的var替换）{
html=html.Replace（replacement.Key，replacement.Value）；
}
返回html；
}
是的，请注意，对于WinForms或Console应用程序，您首先必须添加对System.Web assembly的引用。您好，我尝试了此解决方案，但它不会解码像&lbrace：（@l19是公认的htmlentity吗？我在这里找不到。不过，我确实在W3C开发规范中找到了它。这可能就是为什么它还没有被解码。重复：事实上不是。他有一个不同的问题。它确实是重复的。这个问题在最后有一个额外的步骤，你不需要。“使用这个类可以得到完全相同的结果”-不正确。只有HttpUtility实现才能正确地将'；解码为WP8上的撇号。在我的例子中，HttpUtility.HtmlDecoded
做正确的事情。很好的解决方案：）很好的解决方案，但System.Net.WebUtility.HtmlDecode的缺点是，如果您为旧的Windows 7编码，您将无法在.Net Framework 3.5下找到它。在我的情况下，可以这样做，但我编辑了“var regex=new regex（&[a-z]{2,6}；”的正则表达式；“有很多长度超过5的html字符（比如$eacute；），我还建议将正则表达式更改为var regex=new regex（&[a-zA-Z]{2,7}；）”
，以便包含Ã；等字符。链接已失效。
{
    string inStr = "ó";
    string auxStr = System.Net.WebUtility.HtmlEncode(inStr);
    // auxStr == &#243;
    string outStr = System.Net.WebUtility.HtmlDecode(auxStr);
    // outStr == &#243;
    string outStr2 = System.Net.WebUtility.HtmlDecode("&oacute;");
    // outStr2 == ó
}

    public string EntityToUnicode(string html) {
        var replacements = new Dictionary<string, string>();
        var regex = new Regex("(&[a-z]{2,5};)");
        foreach (Match match in regex.Matches(html)) {
            if (!replacements.ContainsKey(match.Value)) { 
                var unicode = HttpUtility.HtmlDecode(match.Value);
                if (unicode.Length == 1) {
                    replacements.Add(match.Value, string.Concat("&#", Convert.ToInt32(unicode[0]), ";"));
                }
            }
        }
        foreach (var replacement in replacements) {
            html = html.Replace(replacement.Key, replacement.Value);
        }
        return html;
    }

private static readonly Regex HtmlEntityRegex = new Regex("&(#)?([a-zA-Z0-9]*);");

public static string HtmlDecode(this string html)
{
    if (html.IsNullOrEmpty()) return html;
    return HtmlEntityRegex.Replace(html, x => x.Groups[1].Value == "#"
        ? ((char)int.Parse(x.Groups[2].Value)).ToString()
        : HttpUtility.HtmlDecode(x.Groups[0].Value));
}

[Test]
[TestCase(null, null)]
[TestCase("", "")]
[TestCase("&#39;fark&#39;", "'fark'")]
[TestCase("&quot;fark&quot;", "\"fark\"")]
public void should_remove_html_entities(string html, string expected)
{
    html.HtmlDecode().ShouldEqual(expected);
}