当NSURLResponse为textEncodingName返回nil时检测HTML编码
我正在使用此调用加载网站HTML-当NSURLResponse为textEncodingName返回nil时检测HTML编码,html,ios,http,encoding,Html,Ios,Http,Encoding,我正在使用此调用加载网站HTML- NSMutableURLRequest *request = [NSMutableURLRequest requestWithURL:url]; [request setValue:@"utf-8" forHTTPHeaderField:@"Accept-Encoding"]; [request setValue:@"text/html" forHTTPHeaderField:@"Accept"]; [NSURLConnecti
NSMutableURLRequest *request = [NSMutableURLRequest requestWithURL:url];
[request setValue:@"utf-8" forHTTPHeaderField:@"Accept-Encoding"];
[request setValue:@"text/html" forHTTPHeaderField:@"Accept"];
[NSURLConnection sendAsynchronousRequest:request
queue:[NSOperationQueue currentQueue]
completionHandler:^(NSURLResponse *response, NSData *data, NSError *error) { ... }
然后,要将NSData转换为NSString,我需要知道编码,所以我调用-
NSString *textEncoding = [response textEncodingName];
从代码块,但在不指定“Content Encoding”头字段的网站上返回nil
如果我不知道编码,[[NSString alloc]initWithData:data encoding:responseEncoding]
不会给我可读的HTML
如何为不发送“内容编码”标题字段的网站检测正确的编码?可以尝试不同的编码,并查看哪种编码会产生可读文本-
static int encodingPriority[] = {
NSUTF8StringEncoding,
NSASCIIStringEncoding,
NSISOLatin1StringEncoding,
NSISOLatin2StringEncoding,
NSUnicodeStringEncoding,
NSWindowsCP1251StringEncoding,
NSWindowsCP1252StringEncoding,
NSWindowsCP1253StringEncoding,
NSWindowsCP1254StringEncoding,
NSWindowsCP1250StringEncoding,
NSNEXTSTEPStringEncoding,
NSJapaneseEUCStringEncoding,
NSNonLossyASCIIStringEncoding,
NSShiftJISStringEncoding, /* kCFStringEncodingDOSJapanese */
NSISO2022JPStringEncoding, /* ISO 2022 Japanese encoding for e-mail */
NSMacOSRomanStringEncoding,
NSUTF16BigEndianStringEncoding,
NSUTF16LittleEndianStringEncoding,
NSUTF32StringEncoding,
NSUTF32BigEndianStringEncoding,
NSUTF32LittleEndianStringEncoding
};
#define REQUIRED_HTML_STRING @"<html"
- (NSString *)htmlStringForUnknownEncodingData:(NSData *)data detectedEncoding:(NSStringEncoding *)detectedEncoding
{
NSStringEncoding encoding;
NSString *html;
for (int i = 0; i < sizeof(encodingPriority); i++) {
encoding = encodingPriority[i];
// try this encoding
html = [[NSString alloc] initWithData:data encoding:encoding];
// we need to find a text, because bad encoding will return an unreadable text
if (html && [html rangeOfString:REQUIRED_HTML_STRING options:NSCaseInsensitiveSearch].location != NSNotFound) {
*detectedEncoding = encoding;
return html;
}
}
return nil;
}
我试过@Kof的代码。我注意到我从响应中得到的编码是utf-8。如果直接将encoding设置为
[[NSString alloc]initWithData:data encoding:@“utf-8”]
,它肯定会返回null。这是因为编码接受类型NSStringEncoding
,其类型为NSENUM
。如果尝试[NSString alloc]initWithData:data encoding:NSUTF8StringEncoding
,它将返回结果
NSStringEncoding encoding;
html = [self htmlStringForUnknownEncodingData:data detectedEncoding:&encoding];
if (html)
NSLog("Encoding detected!");
else
NSLog("No encoding detected");