Ios 从iPhone上的NSString中删除HTML标记
有几种不同的方法可以从Ios 从iPhone上的NSString中删除HTML标记,ios,objective-c,iphone,cocoa-touch,nsstring,Ios,Objective C,Iphone,Cocoa Touch,Nsstring,有几种不同的方法可以从Cocoa中的NSString中删除HTML标记 是将字符串呈现为NSAttributedString,然后获取呈现的文本 是通过应用xsltString方法来应用执行此操作的XSLT转换来使用NSXMLDocument的-对象 不幸的是,iPhone不支持NSAttributedString或NSXMLDocument。有太多的边缘案例和格式不正确的HTML文档,使用regex或NSScanner让我感觉不舒服。有人能解决这个问题吗 一个建议是简单地寻找开始和结束标记字符
Cocoa
中的NSString
中删除HTML标记
是将字符串呈现为NSAttributedString
,然后获取呈现的文本
是通过应用xsltString
方法来应用执行此操作的XSLT
转换来使用NSXMLDocument的
-对象
不幸的是,iPhone不支持NSAttributedString
或NSXMLDocument
。有太多的边缘案例和格式不正确的HTML
文档,使用regex或NSScanner
让我感觉不舒服。有人能解决这个问题吗
一个建议是简单地寻找开始和结束标记字符,这种方法除了非常琐碎的情况外不会起作用
例如,这些案例(来自同一主题的Perl Cookbook一章)将破坏此方法:
<IMG SRC = "foo.gif" ALT = "A > B">
<!-- <A comment> -->
<script>if (a<b && a>c)</script>
<![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
B>
中频(交流)
>>>>>>>>>>> ]]>
我认为最安全的方法就是解析s,不循环整个字符串,并将s中未包含的任何内容复制到新字符串中。看看NSXMLParser。它是一个SAX风格的解析器。您应该能够使用它来检测XML文档中的标记或其他不需要的元素,并忽略它们,只捕获纯text.这里有一篇博客文章,讨论了几个可用于剥离HTML的库
请注意提供其他解决方案的地方的注释。如果要从网页(html文档)获取不带html标记的内容,请在UIWebViewDidfinishLoading
委托方法中使用此代码
NSString *myText = [webView stringByEvaluatingJavaScriptFromString:@"document.documentElement.textContent"];
如果您愿意使用,它在NSString上有一个类别,添加stringByRemovingHTMLTags方法。请参阅Three20Core子项目中的NSStringAdditions.h。使用此选项
NSString *myregex = @"<[^>]*>"; //regex to remove any html tag
NSString *htmlString = @"<html>bla bla</html>";
NSString *stringWithoutHTML = [hstmString stringByReplacingOccurrencesOfRegex:myregex withString:@""];
NSString*myregex=@“]*>”;//regex删除任何html标记
NSString*htmlString=@“bla bla”;
NSString*stringWithoutHTML=[hstmstringbyreplacingoccurrencesofregex:myregex with string:@'';
别忘了在代码中包含以下内容:#导入“RegexKitLite.h”
以下是下载此API的链接:一个快速的“脏”(删除<和>)解决方案,适用于iOS>=3.2:
-(NSString *) stringByStrippingHTML {
NSRange r;
NSString *s = [[self copy] autorelease];
while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
s = [s stringByReplacingCharactersInRange:r withString:@""];
return s;
}
-(NSString*)stringByStrippingHTML{
NSRange;
NSString*s=[[self copy]autorelease];
while((r=[s rangeOfString:@“]+>”选项:NSRegularExpressionSearch])。位置!=NSNotFound)
s=[s StringByReplacingCharactersRange:r,带字符串:@”“;
返回s;
}
我已将其声明为类别os NSString。#导入“RegexKitLite.h”
#import "RegexKitLite.h"
string text = [html stringByReplacingOccurrencesOfRegex:@"<[^>]+>" withString:@""]
字符串文本=[html stringByReplacingOccurrencesOfRegex:@“]+>”带字符串:@“]
此NSString
类别使用NSXMLParser
从NSString
中准确删除任何HTML
标记。这是一个可以轻松包含在项目中的.m
和.h
文件
然后,通过执行以下操作剥离html
:
导入标题:
#import "NSString_stripHtml.h"
然后调用stripHtml:
NSString* mystring = @"<b>Hello</b> World!!";
NSString* stripped = [mystring stripHtml];
// stripped will be = Hello World!!
NSString*mystring=@“你好,世界!!";
NSString*stripped=[mystring stripHtml];
//脱衣将=你好,世界!!
这也适用于格式错误的HTML
,从技术上讲,它不是XML
我扩展了m.kocikowski的答案,并试图通过使用NSMutableString使其更有效。我还将其结构化,以便在静态Utils类中使用(尽管我知道类别可能是最好的设计),并删除自动释放,以便在ARC项目中编译
包括在这里,以防任何人发现它有用
.h
+ (NSString *)stringByStrippingHTML:(NSString *)inputString;
@interface NSString (NAME_OF_CATEGORY)
- (NSString *)stringByStrippingHTML;
@end
.m
+ (NSString *)stringByStrippingHTML:(NSString *)inputString
{
NSMutableString *outString;
if (inputString)
{
outString = [[NSMutableString alloc] initWithString:inputString];
if ([inputString length] > 0)
{
NSRange r;
while ((r = [outString rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
{
[outString deleteCharactersInRange:r];
}
}
}
return outString;
}
@implementation NSString (NAME_OF_CATEGORY)
- (NSString *)stringByStrippingHTML
{
NSMutableString *outString;
NSString *inputString = self;
if (inputString)
{
outString = [[NSMutableString alloc] initWithString:inputString];
if ([inputString length] > 0)
{
NSRange r;
while ((r = [outString rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
{
[outString deleteCharactersInRange:r];
}
}
}
return outString;
}
@end
+(NSString*)stringByStrippingHTML:(NSString*)inputString
{
NSMutableString*突出显示;
如果(输入字符串)
{
outString=[[NSMutableString alloc]initWithString:inputString];
如果([inputString长度]>0)
{
NSRange;
while((r=[outString rangeOfString:@“]+>”选项:NSRegularExpressionSearch])。位置!=NSNotFound)
{
[超出删除字符范围:r];
}
}
}
回报突出;
}
UITextView*textview=[[UITextView alloc]initWithFrame:CGRectMake(101301250170)];
NSString*str=@“这很简单”;
[textview设置值:str forKey:@“contentToHTMLString”];
textview.textAlignment=NSTextAlignmentLeft;
textview.editable=否;
textview.font=[UIFont fontWithName:@“vardana”大小:20.0];
[UIView addSubview:textview];
对我来说工作很好从m.kocikowski和Dan J的答案中进一步扩展,为新手提供更多解释
1#首先您必须创建代码,使其在任何类中都可用
.h
+ (NSString *)stringByStrippingHTML:(NSString *)inputString;
@interface NSString (NAME_OF_CATEGORY)
- (NSString *)stringByStrippingHTML;
@end
.m
+ (NSString *)stringByStrippingHTML:(NSString *)inputString
{
NSMutableString *outString;
if (inputString)
{
outString = [[NSMutableString alloc] initWithString:inputString];
if ([inputString length] > 0)
{
NSRange r;
while ((r = [outString rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
{
[outString deleteCharactersInRange:r];
}
}
}
return outString;
}
@implementation NSString (NAME_OF_CATEGORY)
- (NSString *)stringByStrippingHTML
{
NSMutableString *outString;
NSString *inputString = self;
if (inputString)
{
outString = [[NSMutableString alloc] initWithString:inputString];
if ([inputString length] > 0)
{
NSRange r;
while ((r = [outString rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
{
[outString deleteCharactersInRange:r];
}
}
}
return outString;
}
@end
3#调用该方法
NSString* sub = [result stringByStrippingHTML];
NSLog(@"%@", sub);
结果是NSString我想从中去除标记。这是m.kocikowski答案的现代化,它删除了空白:
@implementation NSString (StripXMLTags)
- (NSString *)stripXMLTags
{
NSRange r;
NSString *s = [self copy];
while ((r = [s rangeOfString:@"<[^>]+>\\s*" options:NSRegularExpressionSearch]).location != NSNotFound)
s = [s stringByReplacingCharactersInRange:r withString:@""];
return s;
}
@end
@实现NSString(StripXMLTags)
-(NSString*)stripXMLTags
{
NSRange;
NSString*s=[自复制];
while((r=[s rangeOfString:@“]+>\\s*”选项:NSRegularExpressionSearch])。位置!=NSNotFound)
s=[s StringByReplacingCharactersRange:r,带字符串:@”“;
返回s;
}
@结束
您可以像下面这样使用
-(void)myMethod
{
NSString* htmlStr = @"<some>html</string>";
NSString* strWithoutFormatting = [self stringByStrippingHTML:htmlStr];
}
-(NSString *)stringByStrippingHTML:(NSString*)str
{
NSRange r;
while ((r = [str rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
{
str = [str stringByReplacingCharactersInRange:r withString:@""];
}
return str;
}
-(void)myMethod
{
NSString*htmlStr=@“html”;
NSString*strwithoutformat=[self-stringByStrippingHTML:htmlStr];
}
-(NSString*)stringByStrippingHTML:(NSString*)str
{
NSRange;
while((r=[str rangeOfString:@“]+>”选项:NSRegularExpressionSearch])。位置!=NSNotFound)
{
str=[str STRINGBYREPLAcingCharactersRange:r with string:@'';
}
返回str;
}
这里有一个比公认答案更有效的解决方案:
- (NSString*)hp_stringByRemovingTags
{
static NSRegularExpression *regex = nil;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
regex = [NSRegularExpression regularExpressionWithPattern:@"<[^>]+>" options:kNilOptions error:nil];
});
// Use reverse enumerator to delete characters without affecting indexes
NSArray *matches =[regex matchesInString:self options:kNilOptions range:NSMakeRange(0, self.length)];
NSEnumerator *enumerator = matches.reverseObjectEnumerator;
NSTextCheckingResult *match = nil;
NSMutableString *modifiedString = self.mutableCopy;
while ((match = [enumerator nextObject]))
{
[modifiedString deleteCharactersInRange:match.range];
}
return modifiedString;
}
-(NSString*)hp_stringByRemovingTags
{
静态NSRegularExpression*regex=nil;
静态调度一次;
disp
NSAttributedString *str=[[NSAttributedString alloc] initWithData:[trimmedString dataUsingEncoding:NSUTF8StringEncoding] options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]} documentAttributes:nil error:nil];
- (NSString *) stringByStrippingHTML {
NSString *retVal;
@autoreleasepool {
NSRange r;
NSString *s = [[self copy] autorelease];
while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound) {
s = [s stringByReplacingCharactersInRange:r withString:@""];
}
retVal = [s copy];
}
// pool is drained, release s and all temp
// strings created by stringByReplacingCharactersInRange
return retVal;
}
func stripHTMLFromString(string: String) -> String {
var copy = string
while let range = copy.rangeOfString("<[^>]+>", options: .RegularExpressionSearch) {
copy = copy.stringByReplacingCharactersInRange(range, withString: "")
}
copy = copy.stringByReplacingOccurrencesOfString(" ", withString: " ")
copy = copy.stringByReplacingOccurrencesOfString("&", withString: "&")
return copy
}
(NSString *) stringByStrippingHTML:(NSString*)inputString
{
NSAttributedString *attrString = [[NSAttributedString alloc] initWithData:[inputString dataUsingEncoding:NSUTF8StringEncoding] options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,NSCharacterEncodingDocumentAttribute: @(NSUTF8StringEncoding)} documentAttributes:nil error:nil];
NSString *str= [attrString string];
//you can add here replacements as your needs:
[str stringByReplacingOccurrencesOfString:@"[" withString:@""];
[str stringByReplacingOccurrencesOfString:@"]" withString:@""];
[str stringByReplacingOccurrencesOfString:@"\n" withString:@""];
return str;
}
-(NSString *) stringByStrippingHTMLFromString:(NSString *)str {
NSRange range;
while ((range = [str rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
str = [str stringByReplacingCharactersInRange:range withString:@""];
return str;