Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/objective-c/27.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Objective c NSString-仅转换为纯字母表(即删除重音符号和标点符号)_Objective C_Regex_Cocoa_String_Nsstring - Fatal编程技术网

Objective c NSString-仅转换为纯字母表(即删除重音符号和标点符号)

Objective c NSString-仅转换为纯字母表(即删除重音符号和标点符号),objective-c,regex,cocoa,string,nsstring,Objective C,Regex,Cocoa,String,Nsstring,我试着在没有标点、空格、重音等的情况下比较名字。 目前,我正在做以下工作: -(NSString*) prepareString:(NSString*)a { //remove any accents and punctuation; a=[[[NSString alloc] initWithData:[a dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES] encoding:NSASCIIStrin

我试着在没有标点、空格、重音等的情况下比较名字。 目前,我正在做以下工作:

-(NSString*) prepareString:(NSString*)a {
    //remove any accents and punctuation;
    a=[[[NSString alloc] initWithData:[a dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES] encoding:NSASCIIStringEncoding] autorelease];

    a=[a stringByReplacingOccurrencesOfString:@" " withString:@""];
    a=[a stringByReplacingOccurrencesOfString:@"'" withString:@""];
    a=[a stringByReplacingOccurrencesOfString:@"`" withString:@""];
    a=[a stringByReplacingOccurrencesOfString:@"-" withString:@""];
    a=[a stringByReplacingOccurrencesOfString:@"_" withString:@""];
    a=[a lowercaseString];
    return a;
}
但是,我需要对数百个字符串执行此操作,并且我需要使其更有效。有什么想法吗?

考虑使用。你可以这样做:

NSString *searchString      = @"This is neat.";
NSString *regexString       = @"[\W]";
NSString *replaceWithString = @"";
NSString *replacedString    = [searchString stringByReplacingOccurrencesOfRegex:regexString withString:replaceWithString];

NSLog (@"%@", replacedString);
//... Thisisneat
考虑使用,特别是方法(接受NSCharacterSet)和(接受字符串并通过引用返回扫描的字符串)

您可能还希望将其与该选项结合使用,或者与该选项结合使用。这可以简化删除/替换重音的过程,因此您可以专注于删除发音、空格等


如果您必须使用您在问题中提出的方法,那么至少使用NSMutableString和
replaceAccurrencesofString:withString:options:range:
——这将比创建大量几乎相同的自动释放字符串更有效。可能只是减少分配的数量就可以暂时“足够”提高性能。

在使用这些解决方案之前,不要忘记使用
decomposedStringWithCanonicalMapping
来分解任何重音字母。例如,这将把é(U+00E9)变成e‌́(U+0065U+0301)。然后,当您去掉非字母数字字符时,将保留未注释的字母

这一点之所以重要,是因为你可能不希望,比如说,“dän”和“dän”*被视为相同的东西。如果你去掉所有重音字母,就像这些解决方案中的一些一样,你会得到“dn”,所以这些字符串会比较相等

所以,你应该先分解它们,这样你就可以去掉重音,留下字母


*德国的例子。感谢Joris Weimar提供了它。

刚刚遇到了这个问题,可能已经太晚了,但以下是对我有用的东西:

// text is the input string, and this just removes accents from the letters

// lossy encoding turns accented letters into normal letters
NSMutableData *sanitizedData = [text dataUsingEncoding:NSASCIIStringEncoding
                                  allowLossyConversion:YES];

// increase length by 1 adds a 0 byte (increaseLengthBy 
// guarantees to fill the new space with 0s), effectively turning 
// sanitizedData into a c-string
[sanitizedData increaseLengthBy:1];

// now we just create a string with the c-string in sanitizedData
NSString *final = [NSString stringWithCString:[sanitizedData bytes]];

要结合Luiz和Peter的答案给出一个完整的示例,并添加几行代码,您可以得到下面的代码

代码执行以下操作:

  • 创建一组可接受的字符
  • 将重音字母转换为普通字母
  • 删除不在集合中的字符
  • 目标-C Swift(2.2)示例 输出
    两个示例的输出都是:BuverE_-48

    比BillyTheKid18756的答案有一个重要的精确性(这一点由Luiz纠正,但在代码解释中并不明显):

    不要使用
    stringWithCString
    作为删除重音符号的第二步,它可以在字符串末尾添加不需要的字符,因为NSData不是以NULL结尾的(正如stringWithCString所期望的那样)。 或者使用它并向NSData添加一个额外的空字节,就像Luiz在他的代码中所做的那样

    我认为一个更简单的答案是替换:

    NSString *sanitizedText = [NSString stringWithCString:[sanitizedData bytes] encoding:NSASCIIStringEncoding];
    
    作者:

    如果我收回BillyTheKid18756的代码,下面是完整正确的代码:

    // The input text
    NSString *text = @"BûvérÈ!@$&%^&(*^(_()-*/48";
    
    // Defining what characters to accept
    NSMutableCharacterSet *acceptedCharacters = [[NSMutableCharacterSet alloc] init];
    [acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet letterCharacterSet]];
    [acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet decimalDigitCharacterSet]];
    [acceptedCharacters addCharactersInString:@" _-.!"];
    
    // Turn accented letters into normal letters (optional)
    NSData *sanitizedData = [text dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
    // Corrected back-conversion from NSData to NSString
    NSString *sanitizedText = [[[NSString alloc] initWithData:sanitizedData encoding:NSASCIIStringEncoding] autorelease];
    
    // Removing unaccepted characters
    NSString* output = [[sanitizedText componentsSeparatedByCharactersInSet:[acceptedCharacters invertedSet]] componentsJoinedByString:@""];
    
    @接口NSString(过滤)
    -(NSString*)stringByFilteringCharacters:(NSCharacterSet*)字符集;
    @结束
    @实现NSString(过滤)
    -(NSString*)stringByFilteringCharacters:(NSCharacterSet*)字符集{
    NSMutableString*mutString=[NSMutableString stringWithCapacity:[自身长度]];
    对于(int i=0;i<[自身长度];i++){
    字符c=[自身字符索引:i];
    if(![charSet characteristicmember:c])[mutString appendFormat:@“%c”,c];
    }
    返回[NSString stringWithString:mutString];
    }
    @结束
    
    我相信这是最好的解决方案:

    根据要转换的字符串的性质,您可能希望设置固定的区域设置(例如英语),而不是使用用户的当前区域设置。这样,您就可以确保在每台机器上得到相同的结果


    如果要比较字符串,请使用以下方法之一。不要试图更改数据

    - (NSComparisonResult)localizedCompare:(NSString *)aString
    - (NSComparisonResult)localizedCaseInsensitiveCompare:(NSString *)aString
    - (NSComparisonResult)compare:(NSString *)aString options:(NSStringCompareOptions)mask range:(NSRange)range locale:(id)locale
    

    你需要考虑用户区域设置,用字符串来写东西,尤其是名字之类的东西。 在大多数语言中,像ä和å这样的字符并不相同,只是看起来很相似。它们本质上是不同的字符,具有不同于其他字符的含义,但实际规则和语义对于每个语言环境都是不同的

    比较和排序字符串的正确方法是考虑用户的区域设置。其他任何事情都是幼稚的、错误的,而且是在20世纪90年代。别再做了

    如果您试图将数据传递给一个不支持非ASCII的系统,那么,这样做是错误的。将其作为数据块传递

    加上先规范化字符串(参见Peter Hosey的帖子)预合成或分解,基本上选择一个规范化的表单

    - (NSString *)decomposedStringWithCanonicalMapping
    - (NSString *)decomposedStringWithCompatibilityMapping
    - (NSString *)precomposedStringWithCanonicalMapping
    - (NSString *)precomposedStringWithCompatibilityMapping
    
    不,这并不像我们想象的那么简单和容易。
    是的,这需要明智和谨慎的决策。(一点非英语语言的经验会有所帮助)

    这些答案对我来说并没有达到预期效果。具体地说,
    decomposedStringWithCanonicalMapping
    并没有像我预期的那样去除重音/umlauts

    下面是我使用的一个变体,它回答了简短的问题:

    // replace accents, umlauts etc with equivalent letter i.e 'é' becomes 'e'.
    // Always use en_GB (or a locale without the characters you wish to strip) as locale, no matter which language we're taking as input
    NSString *processedString = [string stringByFoldingWithOptions: NSDiacriticInsensitiveSearch locale: [NSLocale localeWithLocaleIdentifier: @"en_GB"]];
    // remove non-letters
    processedString = [[processedString componentsSeparatedByCharactersInSet:[[NSCharacterSet letterCharacterSet] invertedSet]] componentsJoinedByString:@""];
    // trim whitespace
    processedString = [processedString stringByTrimmingCharactersInSet: [NSCharacterSet whitespaceCharacterSet]];
    return processedString;
    

    我想过滤掉除了字母和数字以外的所有东西,所以我修改了Lorean在NSString上实现的一个类别,使其工作起来有点不同。在本例中,您指定的字符串仅包含要保留的字符,其他所有字符都将被过滤掉:

    @interface NSString (PraxCategories)
    + (NSString *)lettersAndNumbers;
    - (NSString*)stringByKeepingOnlyLettersAndNumbers;
    - (NSString*)stringByKeepingOnlyCharactersInString:(NSString *)string;
    @end
    
    
    @implementation NSString (PraxCategories)
    
    + (NSString *)lettersAndNumbers { return @"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"; }
    
    - (NSString*)stringByKeepingOnlyLettersAndNumbers {
        return [self stringByKeepingOnlyCharactersInString:[NSString lettersAndNumbers]];
    }
    
    - (NSString*)stringByKeepingOnlyCharactersInString:(NSString *)string {
        NSCharacterSet *characterSet = [NSCharacterSet characterSetWithCharactersInString:string];
        NSMutableString * mutableString = @"".mutableCopy;
        for (int i = 0; i < [self length]; i++){
            char character = [self characterAtIndex:i];
            if([characterSet characterIsMember:character]) [mutableString appendFormat:@"%c", character];
        }
        return mutableString.copy;
    }
    
    @end
    
    或者,例如,如果您想去除除元音以外的所有内容:

    string = [string stringByKeepingOnlyCharactersInString:@"aeiouAEIOU"];
    
    如果您仍在学习Objective-C,并且没有使用类别,我建议您尝试一下。它们是放置此类内容的最佳位置,因为它为您分类的类的所有对象提供了更多功能


    类别简化并封装您要添加的代码,使其易于在所有项目上重用。这是Objective-C的一大特色

    彼得在Swift中的解决方案:

    let newString = oldString.componentsSeparatedByCharactersInSet(NSCharacterSet.letterCharacterSet().invertedSet).joinWithSeparator("")
    
    例如:

    let oldString = "Jo_ - h !. nn y"
    // "Jo_ - h !. nn y"
    oldString.componentsSeparatedByCharactersInSet(NSCharacterSet.letterCharacterSet().invertedSet)
    // ["Jo", "h", "nn", "y"]
    oldString.componentsSeparatedByCharactersInSet(NSCharacterSet.letterCharacterSet().invertedSet).joinWithSeparator("")
    // "Johnny"
    

    如何使用正则表达式删除所有标点符号而不使用多个语句?我在尽量避免过去
    - (NSString *)decomposedStringWithCanonicalMapping
    - (NSString *)decomposedStringWithCompatibilityMapping
    - (NSString *)precomposedStringWithCanonicalMapping
    - (NSString *)precomposedStringWithCompatibilityMapping
    
    // replace accents, umlauts etc with equivalent letter i.e 'é' becomes 'e'.
    // Always use en_GB (or a locale without the characters you wish to strip) as locale, no matter which language we're taking as input
    NSString *processedString = [string stringByFoldingWithOptions: NSDiacriticInsensitiveSearch locale: [NSLocale localeWithLocaleIdentifier: @"en_GB"]];
    // remove non-letters
    processedString = [[processedString componentsSeparatedByCharactersInSet:[[NSCharacterSet letterCharacterSet] invertedSet]] componentsJoinedByString:@""];
    // trim whitespace
    processedString = [processedString stringByTrimmingCharactersInSet: [NSCharacterSet whitespaceCharacterSet]];
    return processedString;
    
    @interface NSString (PraxCategories)
    + (NSString *)lettersAndNumbers;
    - (NSString*)stringByKeepingOnlyLettersAndNumbers;
    - (NSString*)stringByKeepingOnlyCharactersInString:(NSString *)string;
    @end
    
    
    @implementation NSString (PraxCategories)
    
    + (NSString *)lettersAndNumbers { return @"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"; }
    
    - (NSString*)stringByKeepingOnlyLettersAndNumbers {
        return [self stringByKeepingOnlyCharactersInString:[NSString lettersAndNumbers]];
    }
    
    - (NSString*)stringByKeepingOnlyCharactersInString:(NSString *)string {
        NSCharacterSet *characterSet = [NSCharacterSet characterSetWithCharactersInString:string];
        NSMutableString * mutableString = @"".mutableCopy;
        for (int i = 0; i < [self length]; i++){
            char character = [self characterAtIndex:i];
            if([characterSet characterIsMember:character]) [mutableString appendFormat:@"%c", character];
        }
        return mutableString.copy;
    }
    
    @end
    
    NSString *string = someStringValueThatYouWantToFilter;
    
    string = [string stringByKeepingOnlyLettersAndNumbers];
    
    string = [string stringByKeepingOnlyCharactersInString:@"aeiouAEIOU"];
    
    let newString = oldString.componentsSeparatedByCharactersInSet(NSCharacterSet.letterCharacterSet().invertedSet).joinWithSeparator("")
    
    let oldString = "Jo_ - h !. nn y"
    // "Jo_ - h !. nn y"
    oldString.componentsSeparatedByCharactersInSet(NSCharacterSet.letterCharacterSet().invertedSet)
    // ["Jo", "h", "nn", "y"]
    oldString.componentsSeparatedByCharactersInSet(NSCharacterSet.letterCharacterSet().invertedSet).joinWithSeparator("")
    // "Johnny"