Objective c 从NSXMLDocument中筛选出BOM字符
XML文件中某些元素的stringValue包含BOM表字符。xml文件被标记为UTF-8编码Objective c 从NSXMLDocument中筛选出BOM字符,objective-c,xml,nsxmldocument,nsxmlelement,Objective C,Xml,Nsxmldocument,Nsxmlelement,XML文件中某些元素的stringValue包含BOM表字符。xml文件被标记为UTF-8编码 有些字符位于字符串的开头(应该是从我读到的字符串中),但是有些字符在字符串的中间(从XML文件中写入的字符串可能是错误的字符串)。< /P> 我正在使用以下命令打开文件: NSURL *furl = [NSURL fileURLWithPath:fileName]; if (!furl) { NSLog(@"Error: Can't open NML file '%@'.", fileName
有些字符位于字符串的开头(应该是从我读到的字符串中),但是有些字符在字符串的中间(从XML文件中写入的字符串可能是错误的字符串)。< /P> 我正在使用以下命令打开文件:
NSURL *furl = [NSURL fileURLWithPath:fileName];
if (!furl) {
NSLog(@"Error: Can't open NML file '%@'.", fileName);
return kNxADbReaderTTError;
}
NSError *err=nil;
NSXMLDocument *xmlDoc = [[NSXMLDocument alloc] initWithContentsOfURL:furl options:NSXMLNodeOptionsNone error:&err];
我用这种方式查询元素:
NSXMLElement *anElement;
NSString *name;
...
NSString *valueString = [[anElement attributeForName:name] stringValue];
我的问题是:
我打开的文件错了吗?文件格式不正确吗?我查询元素的字符串值是否错误?如何过滤掉这些字符?在解决另一个问题时,我找到了一种相对干净的方法,可以从NSXMLDocument的源代码中过滤掉不需要的字符。将其粘贴到此处,以防有人遇到类似问题:
@implementation NSXMLDocument (FilterIllegalCharacters)
- (NSXMLDocument *)initWithDataAndIgnoreIllegalCharacters:(NSData *)data illegalChars:(NSCharacterSet *)illegalChars error:(NSError **)error{
// -- Then, read the resulting XML string.
NSMutableString *str = [[NSMutableString alloc] initWithData:data encoding:NSUTF8StringEncoding];
// -- Go through the XML, only caring about attribute value strings
NSMutableArray *charactersToRemove = [NSMutableArray array];
NSUInteger openQuotes = NSNotFound;
for (NSUInteger pos = 0; pos < str.length; ++pos) {
NSUInteger currentChar = [str characterAtIndex:pos];
if (currentChar == '\"') {
if (openQuotes == NSNotFound) {
openQuotes = pos;
}
else {
openQuotes = NSNotFound;
}
}
else if (openQuotes != NSNotFound) {
// -- If we find an illegal character, we make a note of its position.
if ([illegalChars characterIsMember:currentChar]) {
[charactersToRemove addObject:[NSNumber numberWithLong:pos]];
}
}
}
if (charactersToRemove.count) {
NSUInteger index = charactersToRemove.count;
// -- If we have characters to fix, we work thru them backwards, in order to not mess up our saved positions by modifying the XML.
do {
--index;
NSNumber *characterPos = charactersToRemove[index];
[str replaceCharactersInRange:NSMakeRange(characterPos.longValue, 1) withString:@""];
}
while (index > 0);
// -- Finally we update the data with our corrected version
data = [str dataUsingEncoding:NSUTF8StringEncoding];
}
return [[NSXMLDocument alloc] initWithData:data options:NSXMLNodeOptionsNone
error:error];
}
@end
@实现NSXMLDocument(FilterAllegalCharacters)
-(NSXMLDocument*)initWithDataandIgnoreAllegaracters:(NSData*)数据非法字符:(NSCharacterSet*)非法字符错误:(NSError**)错误{
//--然后,读取生成的XML字符串。
NSMutableString*str=[[NSMutableString alloc]initWithData:数据编码:NSUTF8StringEncoding];
//--浏览XML,只关心属性值字符串
NSMutableArray*charactersToRemove=[NSMutableArray];
nsuiger openQuotes=NSNotFound;
用于(整数pos=0;pos0);
//--最后,我们使用更正的版本更新数据
数据=[str dataUsingEncoding:NSUTF8StringEncoding];
}
return[[NSXMLDocument alloc]initWithData:data选项:NSXMLNodeOptionsNone
错误:错误];
}
@结束
您可以传递所需的任何字符集。请注意,这会将读取XML文档的选项设置为“无”。您可能希望出于自己的目的对此进行更改
这只过滤属性字符串的内容,这就是我的格式错误的字符串的来源