C# 正在尝试将字符串转换为正确的格式/编码？_C#_String_Encoding_Character

C# 正在尝试将字符串转换为正确的格式/编码？

c# string encoding

C# 正在尝试将字符串转换为正确的格式/编码？,c#,string,encoding,character,C#,String,Encoding,Character,我有一个程序，可以对法语网页进行屏幕抓取，并找到一个特定的字符串。一旦找到，我就把那个字符串保存起来。返回的字符串显示为用户未配置桌面。或者在法语中是L'Usilisateur ne dispose pas d'un bureau configure.，但实际上显示为：L**\x2639**；利用ne处理pas d**\x2639**；联合国机构配置**�**. 我怎样才能把X2639当作撇号的字符？< /P> 在C语言中是否有一些东西可以用来读取Url并返回正确的短语我已经查看了许多可用的

我有一个程序，可以对法语网页进行屏幕抓取，并找到一个特定的字符串。一旦找到，我就把那个字符串保存起来。返回的字符串显示为用户未配置桌面。或者在法语中是L'Usilisateur ne dispose pas d'un bureau configure.，但实际上显示为：L**\x2639**；利用ne处理pas d**\x2639**；联合国机构配置**�**. 我怎样才能把X2639当作撇号的字符？< /P> 在C语言中是否有一些东西可以用来读取Url并返回正确的短语

我已经查看了许多可用的C功能，但找不到一个能为我提供正确结果的功能

尝试使用的示例代码：

// translated the true French text to English to help out with this example.
// 
Encoding winVar1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.UTF8;
Encoding ascii = Encoding.ASCII;
Encoding unicode = Encoding.Unicode;

string url = String.Format("http://www.My-TEST-SITE.com/);
WebClient webClient = new WebClient();
webClient.Encoding = System.Text.Encoding.UTF8;
string result = webClient.DownloadString(url);
cVar = result.Substring(result.IndexOf("Search_TEXT=")).Length ;
result = result.Substring(result.IndexOf("Search_TEXT="),  cVar);
result = WebUtility.HtmlDecode(result);
result = WebUtility.UrlDecode(result);
result = result.Substring(0, result.IndexOf("Found: "));

返回L**\x2639**；利用ne处理pas d**\x2639**；联合国机构配置**�**. 当它返回时：L'Usitateur ne dispose pas d'un bureau Configure

我正在尝试摆脱\x2639，并获得适当的法语字符来显示asèèèèè等。

我不能确定，但是：

result = result.Substring(result.IndexOf("Search_TEXT="),  cVar);
result = WebUtility.HtmlDecode(result);
result = WebUtility.UrlDecode(result);

对文本进行双重解码是不好的。它要么是URL，要么是HTML，要么两者都不是。并非两者都有。

看起来您的第一个问题不是字符编码，而是某人的自定义组合“A”和“Obsbled”

那个有趣的**\x2639**；实际上只是一句简单的引语。翻译后的十六进制字符\x26变成&so you get**&39**；。删除多余的星号，您将获得html实体&39；。对于HtmlDecode，这将成为简单的撇号'，它只是ascii字符39

试试这个片段。请注意，只有最后一步我们才能执行HTMLDE代码

var input = @"L**\x26#39**;utilisateur ne dispose pas d**\x26#39**;un bureau configur**�**";

var result = Regex.Replace(input, @"\*\*([^*]*)\*\*", "$1");  // Take out the extra stars 

// Unescape \x values
result = Regex.Replace(result,
                       @"\\x([a-fA-F0-9]{2})",
                       match => char.ConvertFromUtf32(Int32.Parse(match.Groups[1].Value,
                                                                  System.Globalization.NumberStyles.HexNumber)));

// Decode html entities
result = System.Net.WebUtility.HtmlDecode(result);

输出为L'Usitateur ne dispose pas d'un bureau配置�

第二个问题是重音e。这实际上是一个编码问题，您可能需要继续使用它才能正确地进行编码。您可能还想尝试UTF16甚至UTF32。但是HtmlAgilityPack可能会自动为您解决这一问题。

您不想使用诸如HtmlAgilityPack之类的适当工具进行网络扫描的任何特定原因？您将很多东西混合在一起。基本上，UTF8是字符的编码方式，Unicode是表示方式。我建议你先读一读这篇关于这一点的文章，你就会明白发生了什么。我不知道HtmlAgilityPack，现在正在阅读文档。至于Joel网站…是的，我已经看到了，但它没有告诉我为什么我的屏幕上仍然没有看到UTF8代码。试图找到完美的代码来给我提供正确的文本。@MaximilianoRios-加1作为文章链接。不客气，我认为我们都应该阅读这些关于这件事背景的文章。正确编写代码非常重要。trued:result=WebUtility.htmldecodesult；//结果=WebUtility.urldecateresult；然后//result=WebUtility.htmldecodesult；结果=WebUtility.urldecateresult；仅UrlDecode就给了我一个关于字符串大小的错误。