C# 解码引用的可打印文件是否正确
我有以下字符串:C# 解码引用的可打印文件是否正确,c#,.net,C#,.net,我有以下字符串: =?utf-8?Q?=5Bproconact_=2D_Verbesserung_=23=32=37=39=5D_=28Neu=29_Stellvertretungen_Benutzerrecht_=2D_andere_k=C3=B6nnen_f=C3=BCr_andere_Stellvertretungen_erstellen_=C3=A4ndern_usw=2E_dadurch_ist_der_Schutz_der_Aktivi=C3=A4ten_Mails_nicht_ge
=?utf-8?Q?=5Bproconact_=2D_Verbesserung_=23=32=37=39=5D_=28Neu=29_Stellvertretungen_Benutzerrecht_=2D_andere_k=C3=B6nnen_f=C3=BCr_andere_Stellvertretungen_erstellen_=C3=A4ndern_usw=2E_dadurch_ist_der_Schutz_der_Aktivi=C3=A4ten_Mails_nicht_gew=C3=A4hrt=...
这是对
[proconact-Verbesserung #279] (Neu) Stellvertretungen Benutzerrecht - andere können für andere Stellvertretungen erstellen ändern usw. dadurch ist der Schutz der Aktiviäten Mails nicht gewährt.
我正在寻找一种方法来解码引用的字符串
我试过:
private static string DecodeQuotedPrintables(string input, string charSet) {
Encoding enc = new ASCIIEncoding();
try {
enc = Encoding.GetEncoding(charSet);
} catch {
enc = new UTF8Encoding();
}
var occurences = new Regex(@"(=[0-9A-Z]{2}){1,}", RegexOptions.Multiline);
var matches = occurences.Matches(input);
foreach (Match match in matches) {
try {
byte[] b = new byte[match.Groups[0].Value.Length / 3];
for (int i = 0; i < match.Groups[0].Value.Length / 3; i++) {
b[i] = byte.Parse(match.Groups[0].Value.Substring(i * 3 + 1, 2), System.Globalization.NumberStyles.AllowHexSpecifier);
}
char[] hexChar = enc.GetChars(b);
input = input.Replace(match.Groups[0].Value, hexChar[0].ToString());
} catch { ;}
}
input = input.Replace("?=", "").Replace("=\r\n", "");
return input;
}
这会回来的
=?utf-8?Q?[proconact_-_Verbesserung_#_(Neu)_Stellvertretungen_Benutzerrecht_-_andere_können_für_andere_Stellvertretungen_erstellen_ändern_usw._dadurch_ist_der_Schutz_der_Aktiviäten_Mails_nicht_gewährt=...
我能做些什么,也会删除u和起始的
=?utf-8?Q?
和尾随的=…
。我不太确定如何删除
=?utf-8?Q?
除非它一直出现,否则如果出现,您可以执行以下操作:
input = input.Split('?')[3];
要删除尾随“=”,可以通过以下方式将其删除:
input = input.Remove(input.Length - 1);
您可以通过用空格替换“\u1”来删除它:
input = input.Replace("_", " ");
您可以在DecodeQuotedPrintables函数中使用这些代码段
希望这有帮助 在我的评论之后,我建议
private static string MessedUpUrlDecode(string input, string encoding)
{
Encoding enc = new ASCIIEncoding();
try
{
enc = Encoding.GetEncoding(charSet);
}
catch
{
enc = new UTF8Encoding();
}
string messedup = input.Split('?')[3];
string cleaned = input.Replace("_", " ").Replace("=...", ".").Replace("=", "%");
return System.Web.HttpUtility.UrlDecode(cleaned, enc);
}
假设源字符串的篡改是一致的。您试图解码的文本通常位于MIME头中,并根据以下Internet标准中定义的规范进行编码: GitHub上有这样一个解码器的示例实现;也许你可以从中得出一些想法: 您还可以使用此在线工具比较结果: 请注意,您的示例文本不正确。本规范声明:
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
根据规范,任何编码字必须以?=
结尾。因此,您的样品必须从以下方面进行校正:
=?utf-8?Q?=5Bproconact_=2D_Verbesserung_=23=32=37=39=5D_=28Neu=29_Stellvertretungen_Benutzerrecht_=2D_andere_k=C3=B6nnen_f=C3=BCr_andere_Stellvertretungen_erstellen_=C3=A4ndern_usw=2E_dadurch_ist_der_Schutz_der_Aktivi=C3=A4ten_Mails_nicht_gew=C3=A4hrt=
…至(滚动至最右侧):
严格地说,您的示例也是无效的,因为它超过了对任何编码单词施加的75个字符的限制;然而,大多数解码器倾向于容忍这种不一致性。我已经测试了5+个代码片段,这是有效的:我修改了regex部分: 测试线:
im sistemlerimizde bak=FDm =E7al=FD=FEmas=FD yap=FDlaca=F0=FDndan; www.gib.=
string encoding = "windows-1254";
string input = "im sistemlerimizde bak=FDm =E7al=FD=FEmas=FD yap=FDlaca=F0=FDndan; www.gib.=";
DecodeQuotedPrintables(input, encoding);
private static string DecodeQuotedPrintables(string input, string charSet)
{
System.Text.Encoding enc = System.Text.Encoding.UTF7;
try
{
enc = Encoding.GetEncoding(charSet);
}
catch
{
enc = new UTF8Encoding();
}
////parse looking for =XX where XX is hexadecimal
//var occurences = new Regex(@"(=[0-9A-Z]{2}){1,}", RegexOptions.Multiline);
var occurences = new Regex("(\\=([0-9A-F][0-9A-F]))", RegexOptions.Multiline);
var matches = occurences.Matches(input);
foreach (Match match in matches)
{
try
{
byte[] b = new byte[match.Groups[0].Value.Length / 3];
for (int i = 0; i < match.Groups[0].Value.Length / 3; i++)
{
b[i] = byte.Parse(match.Groups[0].Value.Substring(i * 3 + 1, 2), System.Globalization.NumberStyles.AllowHexSpecifier);
}
char[] hexChar = enc.GetChars(b);
input = input.Replace(match.Groups[0].Value, hexChar[0].ToString());
}
catch
{ ;}
}
input = input.Replace("?=", "").Replace("=\r\n", "");
return input;
}
示例呼叫:
im sistemlerimizde bak=FDm =E7al=FD=FEmas=FD yap=FDlaca=F0=FDndan; www.gib.=
string encoding = "windows-1254";
string input = "im sistemlerimizde bak=FDm =E7al=FD=FEmas=FD yap=FDlaca=F0=FDndan; www.gib.=";
DecodeQuotedPrintables(input, encoding);
private static string DecodeQuotedPrintables(string input, string charSet)
{
System.Text.Encoding enc = System.Text.Encoding.UTF7;
try
{
enc = Encoding.GetEncoding(charSet);
}
catch
{
enc = new UTF8Encoding();
}
////parse looking for =XX where XX is hexadecimal
//var occurences = new Regex(@"(=[0-9A-Z]{2}){1,}", RegexOptions.Multiline);
var occurences = new Regex("(\\=([0-9A-F][0-9A-F]))", RegexOptions.Multiline);
var matches = occurences.Matches(input);
foreach (Match match in matches)
{
try
{
byte[] b = new byte[match.Groups[0].Value.Length / 3];
for (int i = 0; i < match.Groups[0].Value.Length / 3; i++)
{
b[i] = byte.Parse(match.Groups[0].Value.Substring(i * 3 + 1, 2), System.Globalization.NumberStyles.AllowHexSpecifier);
}
char[] hexChar = enc.GetChars(b);
input = input.Replace(match.Groups[0].Value, hexChar[0].ToString());
}
catch
{ ;}
}
input = input.Replace("?=", "").Replace("=\r\n", "");
return input;
}
代码片段:
im sistemlerimizde bak=FDm =E7al=FD=FEmas=FD yap=FDlaca=F0=FDndan; www.gib.=
string encoding = "windows-1254";
string input = "im sistemlerimizde bak=FDm =E7al=FD=FEmas=FD yap=FDlaca=F0=FDndan; www.gib.=";
DecodeQuotedPrintables(input, encoding);
private static string DecodeQuotedPrintables(string input, string charSet)
{
System.Text.Encoding enc = System.Text.Encoding.UTF7;
try
{
enc = Encoding.GetEncoding(charSet);
}
catch
{
enc = new UTF8Encoding();
}
////parse looking for =XX where XX is hexadecimal
//var occurences = new Regex(@"(=[0-9A-Z]{2}){1,}", RegexOptions.Multiline);
var occurences = new Regex("(\\=([0-9A-F][0-9A-F]))", RegexOptions.Multiline);
var matches = occurences.Matches(input);
foreach (Match match in matches)
{
try
{
byte[] b = new byte[match.Groups[0].Value.Length / 3];
for (int i = 0; i < match.Groups[0].Value.Length / 3; i++)
{
b[i] = byte.Parse(match.Groups[0].Value.Substring(i * 3 + 1, 2), System.Globalization.NumberStyles.AllowHexSpecifier);
}
char[] hexChar = enc.GetChars(b);
input = input.Replace(match.Groups[0].Value, hexChar[0].ToString());
}
catch
{ ;}
}
input = input.Replace("?=", "").Replace("=\r\n", "");
return input;
}
私有静态字符串DecodeQuotedPrintables(字符串输入,字符串字符集)
{
System.Text.Encoding enc=System.Text.Encoding.UTF7;
尝试
{
enc=Encoding.GetEncoding(字符集);
}
抓住
{
enc=新的UTF8Encoding();
}
////解析查找=XX,其中XX为十六进制
//var occurrences=newregex(@“(=[0-9A-Z]{2}{1,}”,RegexOptions.Multiline);
变量发生率=新正则表达式(“(\\=([0-9A-F][0-9A-F])”,RegexOptions.Multiline);
var matches=发生。匹配(输入);
foreach(匹配中的匹配)
{
尝试
{
byte[]b=新字节[match.Groups[0].Value.Length/3];
对于(int i=0;i
如中所述,存在用于此目的的
string unicodeString =
"=?UTF-8?Q?YourText?=";
System.Net.Mail.Attachment attachment = System.Net.Mail.Attachment.CreateAttachmentFromString("", unicodeString);
Console.WriteLine(attachment.Name);
这是邪恶的:
try{…}catch{;}
你应该得到什么?你试图从原始字符串中取出的最后一个字符串是什么?这是我应该得到的原始字符串:[proconact Verbesserung#279](Neu)Stellvertretungen Benutzerrecht-Ander können für Ander Stellvertretungen erstellenändern usw。dadurch ist der Schutz der Aktiviäten mail nicht gewährt.只是一个旁注:您的源字符串看起来像是一个经过加密的url编码字符串,如果没有通过将url编码的实体(如%23
)替换为=23
)而被破坏,则可以轻松解码。如果您无法控制源字符串,可能会取消替换源字符串和url解码,这将大大简化您的方法。@Filburt:源字符串是一个有效(几乎)的RFC 2047编码字;见下面我的答案。