Objective c NSCharacterSet中的NSArray

Objective c NSCharacterSet中的NSArray,objective-c,nscharacterset,Objective C,Nscharacterset,目前我能够制作如下字母表数组 [[NSArray alloc]initWithObjects:@"A",@"B",@"C",@"D",@"E",@"F",@"G",@"H",@"I",@"J",@"K",@"L",@"M",@"N",@"O",@"P",@"Q",@"R",@"S",@"T",@"U",@"V",@"W",@"X",@"Y",@"Z",nil]; 知道这是可以通过 [NSCharacterSet uppercaseLetterCharacterSet] 如何从中生成数组?由于

目前我能够制作如下字母表数组

[[NSArray alloc]initWithObjects:@"A",@"B",@"C",@"D",@"E",@"F",@"G",@"H",@"I",@"J",@"K",@"L",@"M",@"N",@"O",@"P",@"Q",@"R",@"S",@"T",@"U",@"V",@"W",@"X",@"Y",@"Z",nil];
知道这是可以通过

[NSCharacterSet uppercaseLetterCharacterSet]

如何从中生成数组?

由于字符的范围有限(且不太宽),您只需测试哪些字符是给定字符集的成员(暴力):

//这似乎不可用

#定义UNICHAR_MAX(1All以下代码创建一个包含给定字符集所有字符的数组。它也适用于“基本多语言平面”(字符>U+FFFF,例如U+10400 DESERET大写字母长I)之外的字符

例如:

let charset = CharacterSet.uppercaseLetters
let chars = charset.allCharacters()
print(chars.count) // 1521
print(chars) // ["A", "B", "C", ... "]
(请注意,某些字符可能不存在于用于 显示结果。)

我创建了Martin R算法的Swift(v2.1)版本:

let charset = NSCharacterSet.URLPathAllowedCharacterSet();

for var plane : UInt8 in 0...16 {
    if charset.hasMemberInPlane( plane ) {
        var c : UTF32Char;

        for var c : UInt32 = UInt32( plane ) << 16; c < (UInt32(plane)+1) << 16; c++ {
            if charset.longCharacterIsMember(c) {
                var c1 = c.littleEndian // To make it byte-order safe
                let s = NSString(bytes: &c1, length: 4, encoding: NSUTF32LittleEndianStringEncoding);
                NSLog("Char: \(s)");
            }
        }
    }
}
let charset=NSCharacterSet.URLPathAllowedCharacterSet();
对于var平面:0…16中的UInt8{
if字符集.hasMemberInPlane(平面){
变量c:UTF32Char;

对于变量c:UInt32=UInt32(平面)这是通过对swift使用更多的swift来完成的

let characters = NSCharacterSet.uppercaseLetterCharacterSet()
var array      = [String]()

for plane: UInt8 in 0...16 where characters.hasMemberInPlane(plane) {

  for character: UTF32Char in UInt32(plane) << 16..<(UInt32(plane) + 1) << 16 where characters.longCharacterIsMember(character) {

    var endian = character.littleEndian
    let string = NSString(bytes: &endian, length: 4, encoding: NSUTF32LittleEndianStringEncoding) as! String

    array.append(string)

  }

}

print(array)
let characters=NSCharacterSet.uppercaseLetterCharacterSet()
变量数组=[String]()
对于平面:0…16中的UInt8,其中包含字符。hasMemberInPlane(平面){

对于字符:UInt32(平面)中的UTF32Char仅用于拉丁字母表的A-Z(没有希腊字母、变音符号或其他非该人要求的内容):

用于平面:0…16中的UInt8,其中包含字符。hasMemberInPlane(平面){
i=0

对于UInt32(平面)中的字符:UTF32Char您不应该这样做;这不是字符集的用途。
NSCharacterSet
可能是无限的字符集,可能是在尚未发明的代码点中。您只想知道“此字符或字符集是否在此集中?”,对于这一点,它很有用

想象一下这个Swift代码:

让asciiCodepoints=Unicode.Scalar(0x00)…Unicode.Scalar(0x7F)
设asciiCharacterSet=CharacterSet(charactersIn:asciiCodepoints)
设NonaSciCharacterSet=asciiCharacterSet.Inversed
这与Objective-C代码类似:

NSRange asciiCodepoints=NSMakeRange(0x00,0x7F);
NSCharacterSet*ASCICharacterSet=[NSCharacterSet characterSetWithRange:asciiCodepoints];
NSCharacterSet*NonaSciCharacterSet=asciCharacterSet.InversedSet;
可以很容易地说“遍历
asciiCharacterSet
中的所有字符”;这将循环覆盖从
U+0000
U+007F
的所有字符。但是循环覆盖
nonAsciiCharacterSet
中的所有字符意味着什么?你从
U+0080
开始吗?谁说将来不会有负代码点?你在哪里结束?你跳过不可打印的字符吗?什么关于扩展的grapheme集群?因为它是一个集合(顺序无关紧要),所以您的代码可以处理这个循环中的无序代码点吗

这些是您不想在这里回答的问题;从功能上讲,
非ASCII字符集
是无限的,您只想使用它来判断是否有任何给定字符位于ASCII字符集之外


你真正应该问自己的问题是:“我想用这个大写字母数组做什么?”如果(很可能只有当)你真的需要按顺序遍历它,把你关心的放入一个
数组或
字符串(可能是从资源文件读入的)可能是最好的方法。如果要检查字符是否是大写字母集的一部分,则您不关心字符集的顺序,甚至不关心其中有多少个字符,应该使用
CharacterSet.uppercaseLetters.contains(foo)
(在Objective-C中:
[NSCharacterSet.uppercaseLetterCharacterSet包含:foo]


想想非拉丁字符。
CharacterSet。大写字母
涵盖Unicode的一般类别,其中包含
A
Z
以及
Dž
我发现Martin R的解决方案对我来说太慢了,所以我用
CharacterSet
解决了它ode>位图表示法
属性

根据我的基准,这要快得多:

var ranges = [CountableClosedRange<UInt32>]()
let bitmap: Data = characterSet.bitmapRepresentation
var first: UInt32?, last: UInt32?
var plane = 0, nextPlane = 8192
for (j, byte) in bitmap.enumerated() where byte != 0 {
    if j == nextPlane {
        plane += 1
        nextPlane += 8193
        continue
    }
    for i in 0 ..< 8 where byte & 1 << i != 0 {
        let codePoint = UInt32(j - plane) * 8 + UInt32(i)
        if let _last = last, codePoint == _last + 1 {
            last = codePoint
        } else {
            if let first = first, let last = last {
                ranges.append(first ... last)
            }
            first = codePoint
            last = codePoint
        }
    }
}
if let first = first, let last = last {
    ranges.append(first ... last)
}
return ranges
var ranges=[CountableClosedRange]()
让位图:Data=characterSet.bitmapRepresentation
第一个变量:UInt32?,最后一个:UInt32?
变量平面=0,下一个平面=8192
对于bitmap.enumerated()中的(j,byte),其中byte!=0{
如果j==nextPlane{
平面+=1
nextPlane+=8193
持续
}
对于0..<8中的i,其中byte&1受其启发,这里有一种使用
位图表示法从CharacterSet生成数组的有效方法:

extension CharacterSet {
    func characters() -> [Character] {
        // A Unicode scalar is any Unicode code point in the range U+0000 to U+D7FF inclusive or U+E000 to U+10FFFF inclusive.
        return codePoints().compactMap { UnicodeScalar($0) }.map { Character($0) }
    }

    func codePoints() -> [Int] {
        var result: [Int] = []
        var plane = 0
        // following documentation at https://developer.apple.com/documentation/foundation/nscharacterset/1417719-bitmaprepresentation
        for (i, w) in bitmapRepresentation.enumerated() {
            let k = i % 0x2001
            if k == 0x2000 {
                // plane index byte
                plane = Int(w) << 13
                continue
            }
            let base = (plane + k) << 3
            for j in 0 ..< 8 where w & 1 << j != 0 {
                result.append(base + j)
            }
        }
        return result
    }
}
不连续平面示例
let charset=CharacterSet(charactersIn):当然,您可以使用
CharacterSet
创建字符集和字母表,如下所示:


var smallEmojiCharacterSet=CharacterSet(charactersIn:Unicode.Scalar(“你为什么需要这个。或者只是为了好玩?如果你能告诉你为什么需要它在数组中,那就好了。你试过使用ascii值吗?大写字母字符集包含的不仅仅是…Z.ooopppsssss。要理解这段代码,我们需要50K+的声誉。人们会被这段代码吓坏。@H2CO3,我想我只是不知道exi调用NSCharacterSet或NSString以使用单行语句执行此任务的方法的模板。看起来它确实不存在。很高兴从您的响应中看到这种可能性。谢谢。备注:这仅适用于@MartinR Right的字符,至少只要
unichar
为两字节八位字节长(在iOS和OS X上).H2CO3:
NSCharacterSet
也可用于BMP之外的字符,即使
NSString
在内部使用
unichar
c1
不太可能用作
let
,因为in-out
&<
let characters = NSCharacterSet.uppercaseLetterCharacterSet()
var array      = [String]()

for plane: UInt8 in 0...16 where characters.hasMemberInPlane(plane) {

  for character: UTF32Char in UInt32(plane) << 16..<(UInt32(plane) + 1) << 16 where characters.longCharacterIsMember(character) {

    var endian = character.littleEndian
    let string = NSString(bytes: &endian, length: 4, encoding: NSUTF32LittleEndianStringEncoding) as! String

    array.append(string)

  }

}

print(array)
for plane: UInt8 in 0...16 where characters.hasMemberInPlane(plane) {
    i = 0
    for character: UTF32Char in UInt32(plane) << 16...(UInt32(plane) + 1) << 16 where characters.longCharacterIsMember(character) {
        var endian = character.littleEndian
        let string = NSString(bytes: &endian, length: 4, encoding: NSUTF32LittleEndianStringEncoding) as! String
        array.append(string)
        if(array.count == 26) {
            break
        }
    }
    if(array.count == 26) {
        break
    }
}
var ranges = [CountableClosedRange<UInt32>]()
let bitmap: Data = characterSet.bitmapRepresentation
var first: UInt32?, last: UInt32?
var plane = 0, nextPlane = 8192
for (j, byte) in bitmap.enumerated() where byte != 0 {
    if j == nextPlane {
        plane += 1
        nextPlane += 8193
        continue
    }
    for i in 0 ..< 8 where byte & 1 << i != 0 {
        let codePoint = UInt32(j - plane) * 8 + UInt32(i)
        if let _last = last, codePoint == _last + 1 {
            last = codePoint
        } else {
            if let first = first, let last = last {
                ranges.append(first ... last)
            }
            first = codePoint
            last = codePoint
        }
    }
}
if let first = first, let last = last {
    ranges.append(first ... last)
}
return ranges
extension CharacterSet {
    func characters() -> [Character] {
        // A Unicode scalar is any Unicode code point in the range U+0000 to U+D7FF inclusive or U+E000 to U+10FFFF inclusive.
        return codePoints().compactMap { UnicodeScalar($0) }.map { Character($0) }
    }

    func codePoints() -> [Int] {
        var result: [Int] = []
        var plane = 0
        // following documentation at https://developer.apple.com/documentation/foundation/nscharacterset/1417719-bitmaprepresentation
        for (i, w) in bitmapRepresentation.enumerated() {
            let k = i % 0x2001
            if k == 0x2000 {
                // plane index byte
                plane = Int(w) << 13
                continue
            }
            let base = (plane + k) << 3
            for j in 0 ..< 8 where w & 1 << j != 0 {
                result.append(base + j)
            }
        }
        return result
    }
}
let charset = CharacterSet.uppercaseLetters
let chars = charset.characters()
print(chars.count) // 1733
print(chars) // ["A", "B", "C", ... "]