Ios 从字符串优化中提取链接

Ios 从字符串优化中提取链接,ios,string,swift,Ios,String,Swift,我从网站上获取数据(HTML字符串)。我想提取所有链接。我写的功能(它的工作),但它是如此缓慢 你能帮我优化一下吗?我可以使用什么标准函数? 函数逻辑:在文本中找到“http:./”字符串,然后读取字符串(购买字符),直到我无法获得“\” 扩展字符串{ 下标(i:Int)->字符{ 返回self[前进(self.startIndex,i)] } 下标(i:Int)->字符串{ 返回字符串(self[i]作为字符) } 下标(r:范围)->字符串{ 返回substringWithRange(范围(

我从网站上获取数据(HTML字符串)。我想提取所有链接。我写的功能(它的工作),但它是如此缓慢

你能帮我优化一下吗?我可以使用什么标准函数? 函数逻辑:在文本中找到“http:./”字符串,然后读取字符串(购买字符),直到我无法获得“\”

扩展字符串{
下标(i:Int)->字符{
返回self[前进(self.startIndex,i)]
}
下标(i:Int)->字符串{
返回字符串(self[i]作为字符)
}
下标(r:范围)->字符串{
返回substringWithRange(范围(开始:高级(startIndex,r.startIndex),结束:高级(startIndex,r.endIndex)))
}}
func extractallinks(文本:字符串)->数组{
var stringArray=Array()
var find=“http://”作为字符串

对于(var i=countElements(find);i实际上有一个名为
NSDataDetector
的类将为您检测链接


您可以在NSHipster上找到一个例子:

我想知道您是否意识到每次调用countElements时,都会调用一个主要的复杂函数,该函数必须扫描字符串中的所有Unicode字符,并从中提取扩展的grapheme簇并对其进行计数。如果您不知道扩展的grapheme簇是什么那么你应该能够想象这并不便宜,也不过分

只需将其转换为NSString*,调用rangeOfString并完成操作


显然,你所做的是完全不安全的,因为http://并不意味着有链接。你不能只在html中寻找字符串,希望它能工作;它不能。然后还有https、http、http、http、http等等等等。但这很容易,因为真正的恐怖分子要遵循Uttam Sinha评论中的链接。

就像AdamPro13所说的那样上面使用
NSDataDetector
可以轻松获取所有URL,请参见以下代码:

let text = "http://www.google.com. http://www.bla.com"
let types: NSTextCheckingType = .Link
var error : NSError?

let detector = NSDataDetector(types: types.rawValue, error: &error)        
var matches = detector!.matchesInString(text, options: nil, range: NSMakeRange(0, count(text)))

for match in matches {
   println(match.URL!)
}
它输出:

http://www.google.com
http://www.bla.com
更新为Swift 2.0

let text = "http://www.google.com. http://www.bla.com"

func checkForUrls(text: String) -> [URL] {
    let types: NSTextCheckingResult.CheckingType = .link

    do {
        let detector = try NSDataDetector(types: types.rawValue)

        let matches = detector.matches(in: text, options: .reportCompletion, range: NSMakeRange(0, text.count))
    
        return matches.compactMap({$0.url})
    } catch let error {
        debugPrint(error.localizedDescription)
    }

    return []
}

checkForUrls(text: text)
记住在上述情况下使用
guard
语句,它必须位于函数或循环中


我希望这能有所帮助。

正如其他人所指出的,最好使用正则表达式、数据检测器或解析库。但是,作为对字符串处理的具体反馈:

使用Swift字符串的关键是要接受其仅向前的特性。通常情况下,整数索引和随机访问是不必要的。正如@gnasher729所指出的,每次调用
count
时,您都在对字符串进行迭代。类似地,整数索引扩展是线性的,因此如果在循环中使用它们,您可以容易意外地创建二次或三次复杂度算法

但在这种情况下,无需将字符串索引转换为随机访问整数。我认为这是一个仅使用本机字符串索引执行类似逻辑的版本(查找前缀,然后从那里查找“字符-忽略这不适合https、大写/小写等):

func extractAllLinks(text: String) -> [String] {
    var links: [String] = []
    let prefix = "http://"
    let prefixLen = count(prefix)

    for var idx = text.startIndex; idx != text.endIndex; ++idx {
        let candidate = text[idx..<text.endIndex]
        if candidate.hasPrefix(prefix),
           let closingQuote = find(candidate, "\"") {
            let link = candidate[candidate.startIndex..<closingQuote]
            links.append(link)
            idx = advance(idx, count(link))
        }
    }
    return links
}

let text = "This contains the link \"http://www.whatever.com/\" and"
         + " the link \"http://google.com\""

extractAllLinks(text)
func extractallinks(文本:String)->[String]{
变量链接:[字符串]=[]
let prefix=“http://”
让prefixLen=计数(前缀)
对于变量idx=text.startIndex;idx!=text.endIndex;++idx{

让candidate=text[idx..非常有用的线程!下面是一个在Swift 1.2中工作的示例,基于的答案

    // extract first link (if available) and open it!
    let text = "How technology is changing our relationships to each other: http://t.ted.com/mzRtRfX"
    let types: NSTextCheckingType = .Link

    do {
        let detector = try NSDataDetector(types: types.rawValue)
        let matches = detector.matchesInString(text, options: .ReportCompletion, range: NSMakeRange(0, text.characters.count))
        if matches.count > 0 {
            let url = matches[0].URL!
            print("Opening URL: \(url)")
            UIApplication.sharedApplication().openURL(url)
        }

    } catch {
        // none found or some other issue
        print ("error in findAndOpenURL detector")
    }

这就是Swift 5.0的答案

let text = "http://www.google.com. http://www.bla.com"

func checkForUrls(text: String) -> [URL] {
    let types: NSTextCheckingResult.CheckingType = .link

    do {
        let detector = try NSDataDetector(types: types.rawValue)

        let matches = detector.matches(in: text, options: .reportCompletion, range: NSMakeRange(0, text.count))
    
        return matches.compactMap({$0.url})
    } catch let error {
        debugPrint(error.localizedDescription)
    }

    return []
}

checkForUrls(text: text)
细节
  • Swift 5.2,Xcode 11.4(11E146)
解决方案 控制台输出
您可以改为使用正则表达式。您可以尝试开源库html解析。谢谢!我使用常量
var textLength=countElements(text)var findleength=countElements(find)
(因此我只使用了两次countElements).Result-快2倍。但是NSDataDetector-空间速度…我想我会使用它。是的,不要担心它会帮助其他人。很抱歉,它无法处理文本,不包含任何链接。我尝试了“hello swift”guard let detect=detector else{return}无法工作我收到此错误无法使用类型为“”的参数列表调用类型为“”的NSDataDetector“”的初始值设定项“”(类型:UInt64,错误:inout NSError?)“谢谢!我理解这个想法,但我不理解代码
let candidate=text[idx..
let candidate=text[idx..更新了Swift 2.0的答案,包括错误处理和
guard
声明。我希望这对您有所帮助。嗨,伙计,我们如何避免电子邮件检测?我在运行您的代码时遇到了电子邮件问题
let text = "http://www.google.com. http://www.bla.com"

func checkForUrls(text: String) -> [URL] {
    let types: NSTextCheckingResult.CheckingType = .link

    do {
        let detector = try NSDataDetector(types: types.rawValue)

        let matches = detector.matches(in: text, options: .reportCompletion, range: NSMakeRange(0, text.count))
    
        return matches.compactMap({$0.url})
    } catch let error {
        debugPrint(error.localizedDescription)
    }

    return []
}

checkForUrls(text: text)
// MARK: DataDetector

class DataDetector {

    private class func _find(all type: NSTextCheckingResult.CheckingType,
                             in string: String, iterationClosure: (String) -> Bool) {
        guard let detector = try? NSDataDetector(types: type.rawValue) else { return }
        let range = NSRange(string.startIndex ..< string.endIndex, in: string)
        let matches = detector.matches(in: string, options: [], range: range)
        loop: for match in matches {
            for i in 0 ..< match.numberOfRanges {
                let nsrange = match.range(at: i)
                let startIndex = string.index(string.startIndex, offsetBy: nsrange.lowerBound)
                let endIndex = string.index(string.startIndex, offsetBy: nsrange.upperBound)
                let range = startIndex..<endIndex
                guard iterationClosure(String(string[range])) else { break loop }
            }
        }
    }

    class func find(all type: NSTextCheckingResult.CheckingType, in string: String) -> [String] {
        var results = [String]()
        _find(all: type, in: string) {
            results.append($0)
            return true
        }
        return results
    }

    class func first(type: NSTextCheckingResult.CheckingType, in string: String) -> String? {
        var result: String?
        _find(all: type, in: string) {
            result = $0
            return false
        }
        return result
    }
}

// MARK: String extension

extension String {
    var detectedLinks: [String] { DataDetector.find(all: .link, in: self) }
    var detectedFirstLink: String? { DataDetector.first(type: .link, in: self) }
    var detectedURLs: [URL] { detectedLinks.compactMap { URL(string: $0) } }
    var detectedFirstURL: URL? {
        guard let urlString = detectedFirstLink else { return nil }
        return URL(string: urlString)
    }
}
let text = """
Lorm Ipsum is simply dummy text of the printing and typesetting industry. apple.com/ Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. http://gooogle.com. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. yahoo.com It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
"""

print(text.detectedLinks)
print(text.detectedFirstLink)
print(text.detectedURLs)
print(text.detectedFirstURL)
["apple.com/", "http://gooogle.com", "yahoo.com"]
Optional("apple.com/")
[apple.com/, http://gooogle.com, yahoo.com]
Optional(apple.com/)