Json 如何在Swift中解码HTML实体?
我正在从一个站点提取一个JSON文件,收到的字符串之一是: 周末&8216;秋天之王&8217;[视频首映式]|@周末|索菲 如何将&8216之类的内容转换为正确的字符 我制作了一个Xcode游乐场来演示它:Json 如何在Swift中解码HTML实体?,json,swift,html-entities,Json,Swift,Html Entities,我正在从一个站点提取一个JSON文件,收到的字符串之一是: 周末&8216;秋天之王&8217;[视频首映式]|@周末|索菲 如何将&8216之类的内容转换为正确的字符 我制作了一个Xcode游乐场来演示它: import UIKit var error: NSError? let blogUrl: NSURL = NSURL.URLWithString("http://sophisticatedignorance.net/api/get_recent_summary/") let jsonD
import UIKit
var error: NSError?
let blogUrl: NSURL = NSURL.URLWithString("http://sophisticatedignorance.net/api/get_recent_summary/")
let jsonData = NSData(contentsOfURL: blogUrl)
let dataDictionary = NSJSONSerialization.JSONObjectWithData(jsonData, options: nil, error: &error) as NSDictionary
var a = dataDictionary["posts"] as NSArray
println(a[0]["title"])
该答案最近一次针对Swift 5.2和iOS 13.4 SDK进行了修订
没有简单的方法可以做到这一点,但是您可以使用NSAttributedString魔术使这个过程尽可能轻松,并警告您,这个方法也将剥离所有HTML标记
记住仅从主线程初始化NSAttributedString。它使用WebKit来解析下面的HTML,从而满足需求
//这是您案例中的[0][title]
让encodedString=Weeknd&8216;秋天之王&8217;
guard let data=htmlEncodedString.datausing:.utf8 else{
回来
}
let选项:[NSAttributedString.DocumentReadingOptionKey:Any]=[
.documentType:nsAttributeString.documentType.html,
.characterEncoding:String.Encoding.utf8.rawValue
]
guard let attributedString=try?NSAttributedStringdata:数据,选项:选项,文档属性:无其他{
回来
}
//本周的“秋天之王”
让decodedString=attributedString.string
扩展字符串{
init?htmlEncodedString:字符串{
guard let data=htmlEncodedString.datausing:.utf8 else{
归零
}
let选项:[NSAttributedString.DocumentReadingOptionKey:Any]=[
.documentType:nsAttributeString.documentType.html,
.characterEncoding:String.Encoding.utf8.rawValue
]
guard let attributedString=try?NSAttributedStringdata:data,options:options,DocumentAttribute:nil else{
归零
}
self.initattributedString.string
}
}
让encodedString=Weeknd&8216;秋天之王&8217;
让decodedString=StringhtmlEncodedString:encodedString
@Akashivsky的答案很好,并演示了如何利用NSAttributedString解码HTML实体。一个可能的缺点 正如他所说的,所有HTML标记也都被删除了,所以
<strong> 4 < 5 & 3 > 2</strong>
但这在iOS上不可用
这里是一个纯粹的Swift实现。它解码字符实体
引用,如使用字典和所有数字字符
像&64或&x20ac这样的实体。请注意,我并没有列出所有
252个HTML实体
Swift 4:
例如:
let encoded = "<strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €. @ "
let decoded = encoded.stringByDecodingHTMLEntities
print(decoded)
// <strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €. @
Swift 3:
Swift 2:
这就是我的方法。您可以添加Michael瀑布提到的实体词典
extension String {
func htmlDecoded()->String {
guard (self != "") else { return self }
var newStr = self
let entities = [
""" : "\"",
"&" : "&",
"'" : "'",
"<" : "<",
">" : ">",
]
for (name,value) in entities {
newStr = newStr.stringByReplacingOccurrencesOfString(name, withString: value)
}
return newStr
}
}
或
Swift 2版本
使用:
Swift 3版本
我正在寻找一个纯Swift 3.0实用程序,以从HTML字符引用(即macOS和Linux上的服务器端Swift应用程序)中转义到/unescape,但没有找到任何全面的解决方案,因此我编写了自己的实现: HTML4命名字符引用和十六进制/十进制数字字符引用一起使用,它将根据W3 HTML5规范识别特殊的数字字符引用,即&x80;应取消替换为欧元符号unicode U+20AC,而不是U+0080的unicode字符,取消替换时,某些数字字符引用范围应替换为替换字符U+FFFD 用法示例:
import HTMLEntities
// encode example
let html = "<script>alert(\"abc\")</script>"
print(html.htmlEscape())
// Prints ”<script>alert("abc")</script>"
// decode example
let htmlencoded = "<script>alert("abc")</script>"
print(htmlencoded.htmlUnescape())
// Prints ”<script>alert(\"abc\")</script>"
编辑:从版本2.0.0开始,HTML5现在支持HTML5命名字符引用。还实现了符合规范的解析。更新了Swift 3上的答案
extension String {
init?(htmlEncodedString: String) {
let encodedData = htmlEncodedString.data(using: String.Encoding.utf8)!
let attributedOptions = [ NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType]
guard let attributedString = try? NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil) else {
return nil
}
self.init(attributedString.string)
}
计算的var版本
具有实际字体大小转换的Swift 3.0版本
通常,如果直接将HTML内容转换为属性字符串,则字体大小会增加。您可以尝试将HTML字符串转换为属性字符串,然后再次转换以查看差异
相反,这里是实际的大小转换,通过在所有字体上应用0.75的比例,确保字体大小不会改变:
extension String {
func htmlAttributedString() -> NSAttributedString? {
guard let data = self.data(using: String.Encoding.utf16, allowLossyConversion: false) else { return nil }
guard let attriStr = try? NSMutableAttributedString(
data: data,
options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],
documentAttributes: nil) else { return nil }
attriStr.beginEditing()
attriStr.enumerateAttribute(NSFontAttributeName, in: NSMakeRange(0, attriStr.length), options: .init(rawValue: 0)) {
(value, range, stop) in
if let font = value as? UIFont {
let resizedFont = font.withSize(font.pointSize * 0.75)
attriStr.addAttribute(NSFontAttributeName,
value: resizedFont,
range: range)
}
}
attriStr.endEditing()
return attriStr
}
}
Swift 4版本
斯威夫特4
简单用法
斯威夫特4
斯威夫特4
字符串扩展计算变量
没有额外的保护、做、抓等。。。
如果解码失败,则返回原始字符串
看看
为了完整起见,我复制了该网站的主要功能:
为ASCII和UTF-8/UTF-16编码添加实体
删除2100多个命名实体,如&
支持删除十进制和十六进制实体
设计用于支持Swift扩展Grapheme群集→ 100%表情符号证明
完全单元测试
快速的
记录
与Objective-C兼容
优雅的Swift 4解决方案
如果你想要一根绳子
myString = String(htmlString: encodedString)
将此扩展添加到项目中:
extension String {
init(htmlString: String) {
self.init()
guard let encodedData = htmlString.data(using: .utf8) else {
self = htmlString
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData,
options: attributedOptions,
documentAttributes: nil)
self = attributedString.string
} catch {
print("Error: \(error.localizedDescription)")
self = htmlString
}
}
}
extension NSAttributedString {
convenience init(htmlString html: String) throws {
try self.init(data: Data(html.utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil)
}
}
如果您想要一个带有粗体、斜体、链接等的NSAttributed字符串
textField.attributedText = try? NSAttributedString(htmlString: encodedString)
将此扩展添加到项目中:
extension String {
init(htmlString: String) {
self.init()
guard let encodedData = htmlString.data(using: .utf8) else {
self = htmlString
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData,
options: attributedOptions,
documentAttributes: nil)
self = attributedString.string
} catch {
print("Error: \(error.localizedDescription)")
self = htmlString
}
}
}
extension NSAttributedString {
convenience init(htmlString html: String) throws {
try self.init(data: Data(html.utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil)
}
}
斯威夫特4
Swift 4:
最终对我有效的HTML代码、换行符和单引号的总体解决方案
用法:
然后我不得不应用更多的过滤器来去除o
f单引号,例如,不,不,它的,等等,以及新行字符,如\n:
Swift 4.1+
斯威夫特4
我非常喜欢使用文档属性的解决方案。但是,它可能太慢,无法解析文件和/或在表视图单元格中使用。我不敢相信苹果没有提供一个像样的解决方案
作为一种解决方法,我在GitHub上发现了这个字符串扩展,它工作得非常好,解码速度也很快
因此,对于给定答案为减速的情况,请参阅此链接中的解决方案建议:
注意:它不解析HTML标记。Objective-C
Swift 5.1版本
此外,如果要提取日期、图像、元数据、标题和描述,可以使用名为:
什么扩展旨在扩展现有类型以提供新的功能。我理解您的意思,但否定扩展不是一个好办法。@Akashivsky:要正确使用非ASCII字符,您必须添加NSCharacterEncodingDocumentAttribute,比较。这种方法非常繁重,不推荐在TableView或GridView中使用。这太棒了!虽然它阻止了主线程,但有没有办法在后台线程中运行它?这太棒了,谢谢Martin!下面是HTML实体完整列表的扩展:我还稍微修改了它,以提供替换所产生的距离偏移。这允许正确调整可能受这些替换影响的任何字符串属性或实体,例如Twitter实体索引。@MichaelWaterfall和Martin这太夸张了!工作起来很有魅力!我更新了Swift 2的分机,谢谢!我将这个答案转换为与Swift 2兼容,并将其放入一个名为“易用性”的CocoaPod中。请注意,Santiago的Swift 2版本修复了编译时错误,但去掉了strToulString,nil,base完全将导致代码无法处理数字字符实体,当遇到无法识别的实体时,代码将崩溃,而不是优雅地失败。@AdelaChang:事实上,我已经在2015年9月将答案转换为Swift 2。它仍然使用Swift 2.2/Xcode 7.3编译,没有警告。还是你指的是Michael的版本?谢谢,通过这个回答,我解决了我的问题:我使用NSAttributedString时遇到了严重的性能问题。我不太喜欢这样,但我还没有找到更好的解决方案,所以这是针对Swift 2.0的Michael瀑布解决方案的更新版本。此代码不完整,应该尽一切努力避免。未正确处理错误。当事实上存在错误代码时,就会崩溃。当出现错误时,您应该更新代码以至少返回nil。或者您可以使用原始字符串初始化。最后你应该处理这个错误。事实并非如此。哇!效果很好。最初的答案是导致奇怪的崩溃。谢谢更新!对于法语字符,我必须使用utf16I get Error Domain=NSCOCAERRORDOMIN Code=259,因为文件格式不正确,无法打开。当我试着用这个的时候。如果我在主线程上运行完整的do catch,这将消失。我通过检查NSAttributedString文档发现了这一点:不应该从后台线程调用HTML导入器,也就是说,选项字典包含值为HTML的documentType。它将尝试与主线程同步,失败并超时。请,rawValue语法NSAttributedString.DocumentReadingOptionKeyrawValue:NSAttributedString.DocumentAttributeKey.documentType.rawValue和NSAttributedString.DocumentReadingOptionKeyrawValue:NSAttributedString.DocumentAttributeKey.characterEncoding.rawValue非常糟糕。将其替换为.documentType和。characterEncoding@MickeDG-你能解释一下你到底做了什么来解决这个错误吗?我偶尔会收到。@RossBarbish-对不起,Ross,这是很久以前的事了,不记得细节了。你试过我在上面的评论中的建议了吗,即在主线程上运行完整的do catch?我已经听到人们抱怨我的原力未包装。如果你正在研究HTML字符串编码,而你不知道如何处理Swift选项,那你就太超前了。是的,这是一个非常普遍的答案,而且不需要在主线程上运行。这甚至适用于最复杂的HTML转义unicode字符串,如͡°;͜ʖ͡°,而其他答案都没有。是的,这应该更高一些!:事实上,最初的答案不是线程安全的,这对于本质上很低级别的字符串操作来说是一个很大的问题,rawValue语法NSAttributedString.DocumentReadingOptionKeyrawValue:NSAttributedString.DocumentAttributeKey.documentType.rawValue和NSAttributedString.DocumentReadingOptionKeyrawValue:NSAttributedString.DocumentAttributeKey.characterEncoding。
价值观是可怕的。将其替换为.documentType和.CharacterEncoding此解决方案的性能非常糟糕。这可能是好的,为单独的CAE,解析文件不建议。哇!适用于Swift 4!。用法//let encoded=周末&8216;秋天之王&8217;让finalString=encoded.htmldecode我喜欢这个答案的简单性。但是,它在后台运行时会导致崩溃,因为它试图在主线程上运行。你所做的只是添加了一些非常明显的用法。有人对这个答案进行了升级,发现它真的很有用,这说明了什么?@Naishta it告诉你每个人都有不同的观点,这就是周末。ND:不是周末吗?语法突出显示看起来很奇怪,特别是最后一行的注释部分。你能修好吗?Weeknd是个歌手,是的,他的名字就是这样拼写的。解释应该按顺序进行,而不是在评论中。解释应该按顺序进行。例如,它与之前的Swift 4答案有何不同?需要进行解释。例如,它与以前的答案有什么不同?使用了哪些Swift 4.1功能?它是否仅适用于Swift 4.1而不适用于以前的版本?或者它会在Swift 4.1之前工作,比如在Swift 4.0中?是什么使它不能在一些以前的版本中工作,Swift 5.0、Swift 4.1、Swift 4.0等?我在使用CollectionViews解码字符串时发现一个错误非常有趣,谢谢!应该更高一点
NSData dataRes = (nsdata value )
var resString = NSString(data: dataRes, encoding: NSUTF8StringEncoding)
extension String {
init(htmlEncodedString: String) {
self.init()
guard let encodedData = htmlEncodedString.data(using: .utf8) else {
self = htmlEncodedString
return
}
let attributedOptions: [String : Any] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
} catch {
print("Error: \(error)")
self = htmlEncodedString
}
}
}
import HTMLEntities
// encode example
let html = "<script>alert(\"abc\")</script>"
print(html.htmlEscape())
// Prints ”<script>alert("abc")</script>"
// decode example
let htmlencoded = "<script>alert("abc")</script>"
print(htmlencoded.htmlUnescape())
// Prints ”<script>alert(\"abc\")</script>"
print("The Weeknd ‘King Of The Fall’ [Video Premiere] | @TheWeeknd | #SoPhi ".htmlUnescape())
// prints "The Weeknd ‘King Of The Fall’ [Video Premiere] | @TheWeeknd | #SoPhi "
extension String {
init?(htmlEncodedString: String) {
let encodedData = htmlEncodedString.data(using: String.Encoding.utf8)!
let attributedOptions = [ NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType]
guard let attributedString = try? NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil) else {
return nil
}
self.init(attributedString.string)
}
public extension String {
/// Decodes string with HTML encoding.
var htmlDecoded: String {
guard let encodedData = self.data(using: .utf8) else { return self }
let attributedOptions: [String : Any] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: String.Encoding.utf8.rawValue]
do {
let attributedString = try NSAttributedString(data: encodedData,
options: attributedOptions,
documentAttributes: nil)
return attributedString.string
} catch {
print("Error: \(error)")
return self
}
}
}
extension String {
func htmlAttributedString() -> NSAttributedString? {
guard let data = self.data(using: String.Encoding.utf16, allowLossyConversion: false) else { return nil }
guard let attriStr = try? NSMutableAttributedString(
data: data,
options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],
documentAttributes: nil) else { return nil }
attriStr.beginEditing()
attriStr.enumerateAttribute(NSFontAttributeName, in: NSMakeRange(0, attriStr.length), options: .init(rawValue: 0)) {
(value, range, stop) in
if let font = value as? UIFont {
let resizedFont = font.withSize(font.pointSize * 0.75)
attriStr.addAttribute(NSFontAttributeName,
value: resizedFont,
range: range)
}
}
attriStr.endEditing()
return attriStr
}
}
extension String {
init(htmlEncodedString: String) {
self.init()
guard let encodedData = htmlEncodedString.data(using: .utf8) else {
self = htmlEncodedString
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
}
catch {
print("Error: \(error)")
self = htmlEncodedString
}
}
}
extension String {
var replacingHTMLEntities: String? {
do {
return try NSAttributedString(data: Data(utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil).string
} catch {
return nil
}
}
}
let clean = "Weeknd ‘King Of The Fall’".replacingHTMLEntities ?? "default value"
extension String {
mutating func toHtmlEncodedString() {
guard let encodedData = self.data(using: .utf8) else {
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.documentType.rawValue): NSAttributedString.DocumentType.html,
NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.characterEncoding.rawValue): String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
}
catch {
print("Error: \(error)")
}
}
extension String {
var htmlDecoded: String {
let decoded = try? NSAttributedString(data: Data(utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil).string
return decoded ?? self
}
}
myString = String(htmlString: encodedString)
extension String {
init(htmlString: String) {
self.init()
guard let encodedData = htmlString.data(using: .utf8) else {
self = htmlString
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData,
options: attributedOptions,
documentAttributes: nil)
self = attributedString.string
} catch {
print("Error: \(error.localizedDescription)")
self = htmlString
}
}
}
textField.attributedText = try? NSAttributedString(htmlString: encodedString)
extension NSAttributedString {
convenience init(htmlString html: String) throws {
try self.init(data: Data(html.utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil)
}
}
func decodeHTML(string: String) -> String? {
var decodedString: String?
if let encodedData = string.data(using: .utf8) {
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
decodedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil).string
} catch {
print("\(error.localizedDescription)")
}
}
return decodedString
}
extension String {
var htmlDecoded: String {
let decoded = try? NSAttributedString(data: Data(utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil).string
return decoded ?? self
}
}
let yourStringEncoded = yourStringWithHtmlcode.htmlDecoded
var yourNewString = String(yourStringEncoded.filter { !"\n\t\r".contains($0) })
yourNewString = yourNewString.replacingOccurrences(of: "\'", with: "", options: NSString.CompareOptions.literal, range: nil)
var htmlDecoded: String {
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
NSAttributedString.DocumentReadingOptionKey.documentType : NSAttributedString.DocumentType.html,
NSAttributedString.DocumentReadingOptionKey.characterEncoding : String.Encoding.utf8.rawValue
]
let decoded = try? NSAttributedString(data: Data(utf8), options: attributedOptions
, documentAttributes: nil).string
return decoded ?? self
}
+(NSString *) decodeHTMLEnocdedString:(NSString *)htmlEncodedString {
if (!htmlEncodedString) {
return nil;
}
NSData *data = [htmlEncodedString dataUsingEncoding:NSUTF8StringEncoding];
NSDictionary *attributes = @{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: @(NSUTF8StringEncoding)};
NSAttributedString *attributedString = [[NSAttributedString alloc] initWithData:data options:attributes documentAttributes:nil error:nil];
return [attributedString string];
}
import UIKit
extension String {
init(htmlEncodedString: String) {
self.init()
guard let encodedData = htmlEncodedString.data(using: .utf8) else {
self = htmlEncodedString
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
}
catch {
print("Error: \(error)")
self = htmlEncodedString
}
}
}