Objective c NSCharacterSet中的NSArray
目前我能够制作如下字母表数组Objective c NSCharacterSet中的NSArray,objective-c,nscharacterset,Objective C,Nscharacterset,目前我能够制作如下字母表数组 [[NSArray alloc]initWithObjects:@"A",@"B",@"C",@"D",@"E",@"F",@"G",@"H",@"I",@"J",@"K",@"L",@"M",@"N",@"O",@"P",@"Q",@"R",@"S",@"T",@"U",@"V",@"W",@"X",@"Y",@"Z",nil]; 知道这是可以通过 [NSCharacterSet uppercaseLetterCharacterSet] 如何从中生成数组?由于
[[NSArray alloc]initWithObjects:@"A",@"B",@"C",@"D",@"E",@"F",@"G",@"H",@"I",@"J",@"K",@"L",@"M",@"N",@"O",@"P",@"Q",@"R",@"S",@"T",@"U",@"V",@"W",@"X",@"Y",@"Z",nil];
知道这是可以通过
[NSCharacterSet uppercaseLetterCharacterSet]
如何从中生成数组?由于字符的范围有限(且不太宽),您只需测试哪些字符是给定字符集的成员(暴力):
//这似乎不可用
#定义UNICHAR_MAX(1All以下代码创建一个包含给定字符集所有字符的数组。它也适用于“基本多语言平面”(字符>U+FFFF,例如U+10400 DESERET大写字母长I)之外的字符
例如:
let charset = CharacterSet.uppercaseLetters
let chars = charset.allCharacters()
print(chars.count) // 1521
print(chars) // ["A", "B", "C", ... "]
(请注意,某些字符可能不存在于用于
显示结果。)我创建了Martin R算法的Swift(v2.1)版本:
let charset = NSCharacterSet.URLPathAllowedCharacterSet();
for var plane : UInt8 in 0...16 {
if charset.hasMemberInPlane( plane ) {
var c : UTF32Char;
for var c : UInt32 = UInt32( plane ) << 16; c < (UInt32(plane)+1) << 16; c++ {
if charset.longCharacterIsMember(c) {
var c1 = c.littleEndian // To make it byte-order safe
let s = NSString(bytes: &c1, length: 4, encoding: NSUTF32LittleEndianStringEncoding);
NSLog("Char: \(s)");
}
}
}
}
let charset=NSCharacterSet.URLPathAllowedCharacterSet();
对于var平面:0…16中的UInt8{
if字符集.hasMemberInPlane(平面){
变量c:UTF32Char;
对于变量c:UInt32=UInt32(平面)这是通过对swift使用更多的swift来完成的
let characters = NSCharacterSet.uppercaseLetterCharacterSet()
var array = [String]()
for plane: UInt8 in 0...16 where characters.hasMemberInPlane(plane) {
for character: UTF32Char in UInt32(plane) << 16..<(UInt32(plane) + 1) << 16 where characters.longCharacterIsMember(character) {
var endian = character.littleEndian
let string = NSString(bytes: &endian, length: 4, encoding: NSUTF32LittleEndianStringEncoding) as! String
array.append(string)
}
}
print(array)
let characters=NSCharacterSet.uppercaseLetterCharacterSet()
变量数组=[String]()
对于平面:0…16中的UInt8,其中包含字符。hasMemberInPlane(平面){
对于字符:UInt32(平面)中的UTF32Char仅用于拉丁字母表的A-Z(没有希腊字母、变音符号或其他非该人要求的内容):
用于平面:0…16中的UInt8,其中包含字符。hasMemberInPlane(平面){
i=0
对于UInt32(平面)中的字符:UTF32Char您不应该这样做;这不是字符集的用途。NSCharacterSet
可能是无限的字符集,可能是在尚未发明的代码点中。您只想知道“此字符或字符集是否在此集中?”,对于这一点,它很有用
想象一下这个Swift代码:
让asciiCodepoints=Unicode.Scalar(0x00)…Unicode.Scalar(0x7F)
设asciiCharacterSet=CharacterSet(charactersIn:asciiCodepoints)
设NonaSciCharacterSet=asciiCharacterSet.Inversed
这与Objective-C代码类似:
NSRange asciiCodepoints=NSMakeRange(0x00,0x7F);
NSCharacterSet*ASCICharacterSet=[NSCharacterSet characterSetWithRange:asciiCodepoints];
NSCharacterSet*NonaSciCharacterSet=asciCharacterSet.InversedSet;
可以很容易地说“遍历asciiCharacterSet
中的所有字符”;这将循环覆盖从U+0000
到U+007F
的所有字符。但是循环覆盖nonAsciiCharacterSet
中的所有字符意味着什么?你从U+0080
开始吗?谁说将来不会有负代码点?你在哪里结束?你跳过不可打印的字符吗?什么关于扩展的grapheme集群?因为它是一个集合(顺序无关紧要),所以您的代码可以处理这个循环中的无序代码点吗
这些是您不想在这里回答的问题;从功能上讲,非ASCII字符集
是无限的,您只想使用它来判断是否有任何给定字符位于ASCII字符集之外
你真正应该问自己的问题是:“我想用这个大写字母数组做什么?”如果(很可能只有当)你真的需要按顺序遍历它,把你关心的放入一个数组或字符串(可能是从资源文件读入的)可能是最好的方法。如果要检查字符是否是大写字母集的一部分,则您不关心字符集的顺序,甚至不关心其中有多少个字符,应该使用CharacterSet.uppercaseLetters.contains(foo)
(在Objective-C中:[NSCharacterSet.uppercaseLetterCharacterSet包含:foo]
)
想想非拉丁字符。CharacterSet。大写字母
涵盖Unicode的一般类别,其中包含A
到Z
以及Dž
,我发现Martin R的解决方案对我来说太慢了,所以我用CharacterSet
的解决了它ode>位图表示法
属性
根据我的基准,这要快得多:
var ranges = [CountableClosedRange<UInt32>]()
let bitmap: Data = characterSet.bitmapRepresentation
var first: UInt32?, last: UInt32?
var plane = 0, nextPlane = 8192
for (j, byte) in bitmap.enumerated() where byte != 0 {
if j == nextPlane {
plane += 1
nextPlane += 8193
continue
}
for i in 0 ..< 8 where byte & 1 << i != 0 {
let codePoint = UInt32(j - plane) * 8 + UInt32(i)
if let _last = last, codePoint == _last + 1 {
last = codePoint
} else {
if let first = first, let last = last {
ranges.append(first ... last)
}
first = codePoint
last = codePoint
}
}
}
if let first = first, let last = last {
ranges.append(first ... last)
}
return ranges
var ranges=[CountableClosedRange]()
让位图:Data=characterSet.bitmapRepresentation
第一个变量:UInt32?,最后一个:UInt32?
变量平面=0,下一个平面=8192
对于bitmap.enumerated()中的(j,byte),其中byte!=0{
如果j==nextPlane{
平面+=1
nextPlane+=8193
持续
}
对于0..<8中的i,其中byte&1受其启发,这里有一种使用位图表示法从CharacterSet生成数组的有效方法:
extension CharacterSet {
func characters() -> [Character] {
// A Unicode scalar is any Unicode code point in the range U+0000 to U+D7FF inclusive or U+E000 to U+10FFFF inclusive.
return codePoints().compactMap { UnicodeScalar($0) }.map { Character($0) }
}
func codePoints() -> [Int] {
var result: [Int] = []
var plane = 0
// following documentation at https://developer.apple.com/documentation/foundation/nscharacterset/1417719-bitmaprepresentation
for (i, w) in bitmapRepresentation.enumerated() {
let k = i % 0x2001
if k == 0x2000 {
// plane index byte
plane = Int(w) << 13
continue
}
let base = (plane + k) << 3
for j in 0 ..< 8 where w & 1 << j != 0 {
result.append(base + j)
}
}
return result
}
}
不连续平面示例
let charset=CharacterSet(charactersIn):当然,您可以使用CharacterSet
创建字符集和字母表,如下所示:
var smallEmojiCharacterSet=CharacterSet(charactersIn:Unicode.Scalar(“你为什么需要这个。或者只是为了好玩?如果你能告诉你为什么需要它在数组中,那就好了。你试过使用ascii值吗?大写字母字符集包含的不仅仅是…Z.ooopppsssss。要理解这段代码,我们需要50K+的声誉。人们会被这段代码吓坏。@H2CO3,我想我只是不知道exi调用NSCharacterSet或NSString以使用单行语句执行此任务的方法的模板。看起来它确实不存在。很高兴从您的响应中看到这种可能性。谢谢。备注:这仅适用于@MartinR Right的字符,至少只要unichar
为两字节八位字节长(在iOS和OS X上).H2CO3:NSCharacterSet
也可用于BMP之外的字符,即使NSString
在内部使用unichar
。c1
不太可能用作let
,因为in-out&<
let characters = NSCharacterSet.uppercaseLetterCharacterSet()
var array = [String]()
for plane: UInt8 in 0...16 where characters.hasMemberInPlane(plane) {
for character: UTF32Char in UInt32(plane) << 16..<(UInt32(plane) + 1) << 16 where characters.longCharacterIsMember(character) {
var endian = character.littleEndian
let string = NSString(bytes: &endian, length: 4, encoding: NSUTF32LittleEndianStringEncoding) as! String
array.append(string)
}
}
print(array)
for plane: UInt8 in 0...16 where characters.hasMemberInPlane(plane) {
i = 0
for character: UTF32Char in UInt32(plane) << 16...(UInt32(plane) + 1) << 16 where characters.longCharacterIsMember(character) {
var endian = character.littleEndian
let string = NSString(bytes: &endian, length: 4, encoding: NSUTF32LittleEndianStringEncoding) as! String
array.append(string)
if(array.count == 26) {
break
}
}
if(array.count == 26) {
break
}
}
var ranges = [CountableClosedRange<UInt32>]()
let bitmap: Data = characterSet.bitmapRepresentation
var first: UInt32?, last: UInt32?
var plane = 0, nextPlane = 8192
for (j, byte) in bitmap.enumerated() where byte != 0 {
if j == nextPlane {
plane += 1
nextPlane += 8193
continue
}
for i in 0 ..< 8 where byte & 1 << i != 0 {
let codePoint = UInt32(j - plane) * 8 + UInt32(i)
if let _last = last, codePoint == _last + 1 {
last = codePoint
} else {
if let first = first, let last = last {
ranges.append(first ... last)
}
first = codePoint
last = codePoint
}
}
}
if let first = first, let last = last {
ranges.append(first ... last)
}
return ranges
extension CharacterSet {
func characters() -> [Character] {
// A Unicode scalar is any Unicode code point in the range U+0000 to U+D7FF inclusive or U+E000 to U+10FFFF inclusive.
return codePoints().compactMap { UnicodeScalar($0) }.map { Character($0) }
}
func codePoints() -> [Int] {
var result: [Int] = []
var plane = 0
// following documentation at https://developer.apple.com/documentation/foundation/nscharacterset/1417719-bitmaprepresentation
for (i, w) in bitmapRepresentation.enumerated() {
let k = i % 0x2001
if k == 0x2000 {
// plane index byte
plane = Int(w) << 13
continue
}
let base = (plane + k) << 3
for j in 0 ..< 8 where w & 1 << j != 0 {
result.append(base + j)
}
}
return result
}
}
let charset = CharacterSet.uppercaseLetters
let chars = charset.characters()
print(chars.count) // 1733
print(chars) // ["A", "B", "C", ... "]