Objective c 从html文件导入书签

Objective c 从html文件导入书签,objective-c,regex,parsing,bookmarks,nsregularexpression,Objective C,Regex,Parsing,Bookmarks,Nsregularexpression,我正在尝试向我的应用程序添加导入书签功能。我有一些,但它只会提取所有的网址和标题 - (NSArray *)urlsInHTML:(NSString *)html { NSError *error; NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"(?<=href=\").*?(?=\")" options:NSRegularExpressionCaseInse

我正在尝试向我的应用程序添加导入书签功能。我有一些,但它只会提取所有的网址和标题

- (NSArray *)urlsInHTML:(NSString *)html {
    NSError *error;
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"(?<=href=\").*?(?=\")" options:NSRegularExpressionCaseInsensitive error:&error];

    NSArray *arrayOfAllMatches = [regex matchesInString:html options:0 range:NSMakeRange(0, [html length])];

    NSMutableArray *arrayOfURLs = [[NSMutableArray alloc] init];

    for (NSTextCheckingResult *match in arrayOfAllMatches) {
        NSString* substringForMatch = [html substringWithRange:match.range];
        NSLog(@"Extracted URL: %@",substringForMatch);

        [arrayOfURLs addObject:substringForMatch];
    }

    // return non-mutable version of the array
    return [NSArray arrayWithArray:arrayOfURLs];
}

- (NSArray *)titlesOfTagsInHTML:(NSString *)html {
    NSError *error;
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"(?<=\"\\>)(.*?)(?=\\<\\/)" options:NSRegularExpressionCaseInsensitive error:&error];

    NSArray *arrayOfAllMatches = [regex matchesInString:html options:0 range:NSMakeRange(0, [html length])];

    NSMutableArray *arrayOfURLs = [[NSMutableArray alloc] init];

    for (NSTextCheckingResult *match in arrayOfAllMatches) {
        NSString* substringForMatch = [html substringWithRange:match.range];
        NSLog(@"Extracted Title: %@",substringForMatch);

        [arrayOfURLs addObject:substringForMatch];
    }

    // return non-mutable version of the array
    return [NSArray arrayWithArray:arrayOfURLs];
}

- (IBAction)import {

    ProgressAlertView *progressAlert = [[ProgressAlertView alloc] initWithTitle:@"Crux" message:@"Importing Bookmarks..." delegate:self cancelButtonTitle:nil otherButtonTitles:nil];
    [progressAlert show];

    NSString *htmlString = [NSString stringWithContentsOfFile:importingBookmarkFilePath encoding:NSUTF8StringEncoding error:nil];
    NSArray *urls = [self urlsInHTML:htmlString];
    NSArray *titles = [self titlesOfTagsInHTML:htmlString];
    //float progress = [[NSNumber numberWithInt:i] floatValue]/[[NSNumber numberWithInteger:[urls count]-1] floatValue];
    for (int i=0; i<[urls count]; i++) {
        Bookmark *importedBookmark = [[Bookmark alloc] init];
        importedBookmark.url = urls[i];
        importedBookmark.title = titles[i];
        [[[BookmarkManager sharedInstance] bookmarks] addObject:importedBookmark];
        [[BookmarkManager sharedInstance] saveBookmarks];
    }
}
-(NSArray*)urlsInHTML:(NSString*)html{
n错误*错误;

NSRegularExpression*regex=[NSRegularExpression regular expressionwithpattern:@”(?格式没有那么复杂,因此您应该能够使用
NSScanner
对其进行解析。一般流程如下:

  • 最多扫描
  • 检查以下内容是否为H3或A(文件夹或书签)
  • 相应地处理
  • 重复

文件夹可以有子文件夹,因此您需要递归地创建对象。祝您好运。

由于内容是分层的,因此使用XML解析比使用正则表达式解析更好,以获得所需的结果。(此注释假定您使用的是Safari导出的html文件)是的,我是,我不知道我可以用XML解析它。NSXMLParser能做到这一点吗?它应该做到。请尝试一下。我做到了,它不起作用:[导出的html文件看起来格式不好。这可能是NSXMLParser失败的原因。请查看此问题及其建议的答案。