Ios 从iPhone上的NSString中删除HTML标记

Ios 从iPhone上的NSString中删除HTML标记,ios,objective-c,iphone,cocoa-touch,nsstring,Ios,Objective C,Iphone,Cocoa Touch,Nsstring,有几种不同的方法可以从Cocoa中的NSString中删除HTML标记 是将字符串呈现为NSAttributedString,然后获取呈现的文本 是通过应用xsltString方法来应用执行此操作的XSLT转换来使用NSXMLDocument的-对象 不幸的是,iPhone不支持NSAttributedString或NSXMLDocument。有太多的边缘案例和格式不正确的HTML文档,使用regex或NSScanner让我感觉不舒服。有人能解决这个问题吗 一个建议是简单地寻找开始和结束标记字符

有几种不同的方法可以从
Cocoa
中的
NSString
中删除
HTML标记

是将字符串呈现为
NSAttributedString
,然后获取呈现的文本

是通过应用xsltString
方法来应用执行此操作的
XSLT
转换来使用
NSXMLDocument的
-
对象

不幸的是,iPhone不支持
NSAttributedString
NSXMLDocument
。有太多的边缘案例和格式不正确的
HTML
文档,使用regex或
NSScanner
让我感觉不舒服。有人能解决这个问题吗

一个建议是简单地寻找开始和结束标记字符,这种方法除了非常琐碎的情况外不会起作用

例如,这些案例(来自同一主题的Perl Cookbook一章)将破坏此方法:

<IMG SRC = "foo.gif" ALT = "A > B">

<!-- <A comment> -->

<script>if (a<b && a>c)</script>

<![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
B>
中频(交流)
>>>>>>>>>>> ]]>

我认为最安全的方法就是解析s,不循环整个字符串,并将s中未包含的任何内容复制到新字符串中。

看看NSXMLParser。它是一个SAX风格的解析器。您应该能够使用它来检测XML文档中的标记或其他不需要的元素,并忽略它们,只捕获纯text.

这里有一篇博客文章,讨论了几个可用于剥离HTML的库
请注意提供其他解决方案的地方的注释。

如果要从网页(html文档)获取不带html标记的内容,请在
UIWebViewDidfinishLoading
委托方法中使用此代码

  NSString *myText = [webView stringByEvaluatingJavaScriptFromString:@"document.documentElement.textContent"];
如果您愿意使用,它在NSString上有一个类别,添加stringByRemovingHTMLTags方法。请参阅Three20Core子项目中的NSStringAdditions.h。

使用此选项

NSString *myregex = @"<[^>]*>"; //regex to remove any html tag

NSString *htmlString = @"<html>bla bla</html>";
NSString *stringWithoutHTML = [hstmString stringByReplacingOccurrencesOfRegex:myregex withString:@""];
NSString*myregex=@“]*>”;//regex删除任何html标记
NSString*htmlString=@“bla bla”;
NSString*stringWithoutHTML=[hstmstringbyreplacingoccurrencesofregex:myregex with string:@'';
别忘了在代码中包含以下内容:#导入“RegexKitLite.h” 以下是下载此API的链接:

一个快速的“脏”(删除<和>)解决方案,适用于iOS>=3.2:

-(NSString *) stringByStrippingHTML {
  NSRange r;
  NSString *s = [[self copy] autorelease];
  while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
    s = [s stringByReplacingCharactersInRange:r withString:@""];
  return s;
}
-(NSString*)stringByStrippingHTML{
NSRange;
NSString*s=[[self copy]autorelease];
while((r=[s rangeOfString:@“]+>”选项:NSRegularExpressionSearch])。位置!=NSNotFound)
s=[s StringByReplacingCharactersRange:r,带字符串:@”“;
返回s;
}
我已将其声明为类别os NSString。

#导入“RegexKitLite.h”
#import "RegexKitLite.h"

string text = [html stringByReplacingOccurrencesOfRegex:@"<[^>]+>" withString:@""]
字符串文本=[html stringByReplacingOccurrencesOfRegex:@“]+>”带字符串:@“]
NSString
类别使用
NSXMLParser
NSString
中准确删除任何
HTML
标记。这是一个可以轻松包含在项目中的
.m
.h
文件

然后,通过执行以下操作剥离
html

导入标题:

#import "NSString_stripHtml.h"
然后调用stripHtml:

NSString* mystring = @"<b>Hello</b> World!!";
NSString* stripped = [mystring stripHtml];
// stripped will be = Hello World!!
NSString*mystring=@“你好,世界!!";
NSString*stripped=[mystring stripHtml];
//脱衣将=你好,世界!!

这也适用于格式错误的
HTML
,从技术上讲,它不是
XML

我扩展了m.kocikowski的答案,并试图通过使用NSMutableString使其更有效。我还将其结构化,以便在静态Utils类中使用(尽管我知道类别可能是最好的设计),并删除自动释放,以便在ARC项目中编译

包括在这里,以防任何人发现它有用

.h

+ (NSString *)stringByStrippingHTML:(NSString *)inputString;
@interface NSString (NAME_OF_CATEGORY)

- (NSString *)stringByStrippingHTML;

@end
.m

+ (NSString *)stringByStrippingHTML:(NSString *)inputString 
{
  NSMutableString *outString;

  if (inputString)
  {
    outString = [[NSMutableString alloc] initWithString:inputString];

    if ([inputString length] > 0)
    {
      NSRange r;

      while ((r = [outString rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
      {
        [outString deleteCharactersInRange:r];
      }      
    }
  }

  return outString; 
}
@implementation NSString (NAME_OF_CATEGORY)

- (NSString *)stringByStrippingHTML
{
NSMutableString *outString;
NSString *inputString = self;

if (inputString)
{
    outString = [[NSMutableString alloc] initWithString:inputString];

    if ([inputString length] > 0)
    {
        NSRange r;

        while ((r = [outString rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
        {
            [outString deleteCharactersInRange:r];
        }
    }
}

return outString;
}

@end
+(NSString*)stringByStrippingHTML:(NSString*)inputString
{
NSMutableString*突出显示;
如果(输入字符串)
{
outString=[[NSMutableString alloc]initWithString:inputString];
如果([inputString长度]>0)
{
NSRange;
while((r=[outString rangeOfString:@“]+>”选项:NSRegularExpressionSearch])。位置!=NSNotFound)
{
[超出删除字符范围:r];
}      
}
}
回报突出;
}
UITextView*textview=[[UITextView alloc]initWithFrame:CGRectMake(101301250170)];
NSString*str=@“这很简单”;
[textview设置值:str forKey:@“contentToHTMLString”];
textview.textAlignment=NSTextAlignmentLeft;
textview.editable=否;
textview.font=[UIFont fontWithName:@“vardana”大小:20.0];
[UIView addSubview:textview];

对我来说工作很好

从m.kocikowski和Dan J的答案中进一步扩展,为新手提供更多解释

1#首先您必须创建代码,使其在任何类中都可用

.h

+ (NSString *)stringByStrippingHTML:(NSString *)inputString;
@interface NSString (NAME_OF_CATEGORY)

- (NSString *)stringByStrippingHTML;

@end
.m

+ (NSString *)stringByStrippingHTML:(NSString *)inputString 
{
  NSMutableString *outString;

  if (inputString)
  {
    outString = [[NSMutableString alloc] initWithString:inputString];

    if ([inputString length] > 0)
    {
      NSRange r;

      while ((r = [outString rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
      {
        [outString deleteCharactersInRange:r];
      }      
    }
  }

  return outString; 
}
@implementation NSString (NAME_OF_CATEGORY)

- (NSString *)stringByStrippingHTML
{
NSMutableString *outString;
NSString *inputString = self;

if (inputString)
{
    outString = [[NSMutableString alloc] initWithString:inputString];

    if ([inputString length] > 0)
    {
        NSRange r;

        while ((r = [outString rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
        {
            [outString deleteCharactersInRange:r];
        }
    }
}

return outString;
}

@end
3#调用该方法

NSString* sub = [result stringByStrippingHTML];
NSLog(@"%@", sub);

结果是NSString我想从中去除标记。

这是m.kocikowski答案的现代化,它删除了空白:

@implementation NSString (StripXMLTags)

- (NSString *)stripXMLTags
{
    NSRange r;
    NSString *s = [self copy];
    while ((r = [s rangeOfString:@"<[^>]+>\\s*" options:NSRegularExpressionSearch]).location != NSNotFound)
        s = [s stringByReplacingCharactersInRange:r withString:@""];
    return s;
}

@end
@实现NSString(StripXMLTags)
-(NSString*)stripXMLTags
{
NSRange;
NSString*s=[自复制];
while((r=[s rangeOfString:@“]+>\\s*”选项:NSRegularExpressionSearch])。位置!=NSNotFound)
s=[s StringByReplacingCharactersRange:r,带字符串:@”“;
返回s;
}
@结束
您可以像下面这样使用

-(void)myMethod
 {

 NSString* htmlStr = @"<some>html</string>";
 NSString* strWithoutFormatting = [self stringByStrippingHTML:htmlStr];

 }

 -(NSString *)stringByStrippingHTML:(NSString*)str
 {
   NSRange r;
   while ((r = [str rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location     != NSNotFound)
  {
     str = [str stringByReplacingCharactersInRange:r withString:@""];
 }
  return str;
 }
-(void)myMethod
{
NSString*htmlStr=@“html”;
NSString*strwithoutformat=[self-stringByStrippingHTML:htmlStr];
}
-(NSString*)stringByStrippingHTML:(NSString*)str
{
NSRange;
while((r=[str rangeOfString:@“]+>”选项:NSRegularExpressionSearch])。位置!=NSNotFound)
{
str=[str STRINGBYREPLAcingCharactersRange:r with string:@'';
}
返回str;
}

这里有一个比公认答案更有效的解决方案:

- (NSString*)hp_stringByRemovingTags
{
    static NSRegularExpression *regex = nil;
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        regex = [NSRegularExpression regularExpressionWithPattern:@"<[^>]+>" options:kNilOptions error:nil];
    });

    // Use reverse enumerator to delete characters without affecting indexes
    NSArray *matches =[regex matchesInString:self options:kNilOptions range:NSMakeRange(0, self.length)];
    NSEnumerator *enumerator = matches.reverseObjectEnumerator;

    NSTextCheckingResult *match = nil;
    NSMutableString *modifiedString = self.mutableCopy;
    while ((match = [enumerator nextObject]))
    {
        [modifiedString deleteCharactersInRange:match.range];
    }
    return modifiedString;
}
-(NSString*)hp_stringByRemovingTags
{
静态NSRegularExpression*regex=nil;
静态调度一次;
disp
NSAttributedString *str=[[NSAttributedString alloc] initWithData:[trimmedString dataUsingEncoding:NSUTF8StringEncoding] options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]} documentAttributes:nil error:nil];
- (NSString *) stringByStrippingHTML {
    NSString *retVal;
    @autoreleasepool {
        NSRange r;
        NSString *s = [[self copy] autorelease];
        while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound) {
            s = [s stringByReplacingCharactersInRange:r withString:@""];
        }
        retVal = [s copy];
    } 
    // pool is drained, release s and all temp 
    // strings created by stringByReplacingCharactersInRange
    return retVal;
}
func stripHTMLFromString(string: String) -> String {
  var copy = string
  while let range = copy.rangeOfString("<[^>]+>", options: .RegularExpressionSearch) {
    copy = copy.stringByReplacingCharactersInRange(range, withString: "")
  }
  copy = copy.stringByReplacingOccurrencesOfString("&nbsp;", withString: " ")
  copy = copy.stringByReplacingOccurrencesOfString("&amp;", withString: "&")
  return copy
}
(NSString *) stringByStrippingHTML:(NSString*)inputString
{ 
NSAttributedString *attrString = [[NSAttributedString alloc] initWithData:[inputString dataUsingEncoding:NSUTF8StringEncoding] options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,NSCharacterEncodingDocumentAttribute: @(NSUTF8StringEncoding)} documentAttributes:nil error:nil];
NSString *str= [attrString string]; 

//you can add here replacements as your needs:
    [str stringByReplacingOccurrencesOfString:@"[" withString:@""];
    [str stringByReplacingOccurrencesOfString:@"]" withString:@""];
    [str stringByReplacingOccurrencesOfString:@"\n" withString:@""];

    return str;
}
-(NSString *) stringByStrippingHTMLFromString:(NSString *)str {
NSRange range;
while ((range = [str rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
    str = [str stringByReplacingCharactersInRange:range withString:@""];
return str;