使用Objective C / Cocoa来避开unicode字符,即\ u1234
我从中获取数据的某些站点正在返回UTF-8string,并且UTF-8字符被转义,即: \u5404\u500b\u90fd
有一个内置的cocoafunction,可能有助于这一点,或者我将不得不写我自己的解码algorithm。
没有内置的function来进行通信。
您可以使用NSPropertyListSerialization
作弊,因为“旧文本样式”plist支持通过\Uxxxx
转义:
NSString* input = @"ab\"cA\"BC\\u2345\\u0123"; // will cause trouble if you have "abc\\\\uvw" NSString* esc1 = [input stringByReplacingOccurrencesOfString:@"\\u" withString:@"\\U"]; NSString* esc2 = [esc1 stringByReplacingOccurrencesOfString:@"\"" withString:@"\\\""]; NSString* quoted = [[@"\"" stringByAppendingString:esc2] stringByAppendingString:@"\""]; NSData* data = [quoted dataUsingEncoding:NSUTF8StringEncoding]; NSString* unesc = [NSPropertyListSerialization propertyListFromData:data mutabilityOption:NSPropertyListImmutable format:NULL errorDescription:NULL]; assert([unesc isKindOfClass:[NSString class]]); NSLog(@"Output = %@", unesc);
但是请记住,这不是很有效率。 如果你写出你自己的parsing器,这会好得多。 (顺便说一句,你解码JSONstring?如果是的话,你可以使用现有的JSONparsing器 。)
Cocoa没有提供解决scheme是正确的 ,但Core Foundation却这样做: CFStringTransform
。
CFStringTransform
存在于Mac OS(和iOS)的尘土飞扬的偏远angular落,所以它有点儿知道gem。 这是苹果兼容ICUstring转换引擎的前端。 它可以像希腊和拉丁文(或任何已知的脚本)之间的音译一样执行真正的魔术,但它也可以用来做一些普通的任务,比如从一个蹩脚的服务器中翻阅string:
NSString *input = @"\\u5404\\u500b\\u90fd"; NSString *convertedString = [input mutableCopy]; CFStringRef transform = CFSTR("Any-Hex/Java"); CFStringTransform((__bridge CFMutableStringRef)convertedString, NULL, transform, YES); NSLog(@"convertedString: %@", convertedString); // prints: 各個都, tada!
正如我所说的, CFStringTransform
非常强大。 它支持一些预定义的转换,如大小写映射,规范化或unicode字符名称转换。 你甚至可以devise你自己的转换。
我不知道为什么苹果不能从Cocoa提供。
编辑2015:
OS X 10.11和iOS 9将以下方法添加到Foundation中:
- (nullable NSString *)stringByApplyingTransform:(NSString *)transform reverse:(BOOL)reverse;
所以上面的例子变成了…
NSString *input = @"\\u5404\\u500b\\u90fd"; NSString *convertedString = [input stringByApplyingTransform:@"Any-Hex/Java" reverse:YES]; NSLog(@"convertedString: %@", convertedString);
谢谢@nschmidt的抬头。
这是我写的。 希望这会帮助一些人。
+ (NSString*) unescapeUnicodeString:(NSString*)string { // unescape quotes and backwards slash NSString* unescapedString = [string stringByReplacingOccurrencesOfString:@"\\\"" withString:@"\""]; unescapedString = [unescapedString stringByReplacingOccurrencesOfString:@"\\\\" withString:@"\\"]; // tokenize based on unicode escape char NSMutableString* tokenizedString = [NSMutableString string]; NSScanner* scanner = [NSScanner scannerWithString:unescapedString]; while ([scanner isAtEnd] == NO) { // read up to the first unicode marker // if a string has been scanned, it's a token // and should be appended to the tokenized string NSString* token = @""; [scanner scanUpToString:@"\\u" intoString:&token]; if (token != nil && token.length > 0) { [tokenizedString appendString:token]; continue; } // skip two characters to get past the marker // check if the range of unicode characters is // beyond the end of the string (could be malformed) // and if it is, move the scanner to the end // and skip this token NSUInteger location = [scanner scanLocation]; NSInteger extra = scanner.string.length - location - 4 - 2; if (extra < 0) { NSRange range = {location, -extra}; [tokenizedString appendString:[scanner.string substringWithRange:range]]; [scanner setScanLocation:location - extra]; continue; } // move the location pas the unicode marker // then read in the next 4 characters location += 2; NSRange range = {location, 4}; token = [scanner.string substringWithRange:range]; unichar codeValue = (unichar) strtol([token UTF8String], NULL, 16); [tokenizedString appendString:[NSString stringWithFormat:@"%C", codeValue]]; // move the scanner past the 4 characters // then keep scanning location += 4; [scanner setScanLocation:location]; } // done return tokenizedString; } + (NSString*) escapeUnicodeString:(NSString*)string { // lastly escaped quotes and back slash // note that the backslash has to be escaped before the quote // otherwise it will end up with an extra backslash NSString* escapedString = [string stringByReplacingOccurrencesOfString:@"\\" withString:@"\\\\"]; escapedString = [escapedString stringByReplacingOccurrencesOfString:@"\"" withString:@"\\\""]; // convert to encoded unicode // do this by getting the data for the string // in UTF16 little endian (for network byte order) NSData* data = [escapedString dataUsingEncoding:NSUTF16LittleEndianStringEncoding allowLossyConversion:YES]; size_t bytesRead = 0; const char* bytes = data.bytes; NSMutableString* encodedString = [NSMutableString string]; // loop through the byte array // read two bytes at a time, if the bytes // are above a certain value they are unicode // otherwise the bytes are ASCII characters // the %C format will write the character value of bytes while (bytesRead < data.length) { uint16_t code = *((uint16_t*) &bytes[bytesRead]); if (code > 0x007E) { [encodedString appendFormat:@"\\u%04X", code]; } else { [encodedString appendFormat:@"%C", code]; } bytesRead += sizeof(uint16_t); } // done return encodedString; }
简单的代码:
const char *cString = [unicodeStr cStringUsingEncoding:NSUTF8StringEncoding]; NSString *resultStr = [NSString stringWithCString:cString encoding:NSNonLossyASCIIStringEncoding];
来自: https : //stackoverflow.com/a/7861345