文章/答案/技术大牛

发布

社区首页 >问答首页 >来自NSInputStream的字符串不是有效的utf8。如何将更多的“有损”转换为utf8

问来自NSInputStream的字符串不是有效的utf8。如何将更多的“有损”转换为utf8
EN

Stack Overflow用户

提问于 2015-05-21 11:48:49

回答 2查看 595关注 0票数 1

我有一个从服务器读取数据的应用程序。有时，数据似乎是无效的UTF-8.如果从字节数组转换为UTF8 8-字符串，则字符串显示为零。字节数组中一定有一些无效的非UTF8 8字符。是否有一种方法可以将字节数组转换为UTF8并只筛选出无效字符？

有什么想法吗？

我的代码如下所示：

- (void)stream:(NSStream *)theStream handleEvent:(NSStreamEvent)streamEvent {

switch (streamEvent){
    case NSStreamEventHasBytesAvailable:
    {
        uint8_t buffer[1024];
        int len;
        NSMutableData * inputData = [NSMutableData data];
        while ([directoryStream hasBytesAvailable]){
            len = [directoryStream read:buffer maxLength:sizeof(buffer)];
            if (len> 0) {
                [inputData appendBytes:(const void *)buffer length:len];
            }
        }
        NSString *directoryString = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
    }
    NSLog(@"directoryString: %@", directoryString);

    ...

有没有一种更“有损”的方式来完成这个转换？

正如您所看到的，我首先将数据块附加到一个NSData值中，并在读取所有内容时将其转换为utf8。这将防止(多字节) utf8字符被拆分，从而导致更多无效(空) utf8字符串。

encoding

utf-8

nsstring

回答 2

Stack Overflow用户

回答已采纳

发布于 2015-05-22 09:10:32

它起作用了!通过结合来自Larme的代码片段和关于UTF-8字符大小的注释，我成功地创建了一个“有损”的NSData到UTF-8 NSString转换方法。

+ (NSString *) data2UTF8String:(NSData *) data {

    // First try to do the 'standard' UTF-8 conversion 
    NSString * bufferStr = [[NSString alloc] initWithData:data
                                                 encoding:NSUTF8StringEncoding];

    // if it fails, do the 'lossy' UTF8 conversion
    if (!bufferStr) {
        const Byte * buffer = [data bytes];

        NSMutableString * filteredString = [[NSMutableString alloc] init];

        int i = 0;
        while (i < [data length]) {

            int expectedLength = 1;

            if      ((buffer[i] & 0b10000000) == 0b00000000) expectedLength = 1;
            else if ((buffer[i] & 0b11100000) == 0b11000000) expectedLength = 2;
            else if ((buffer[i] & 0b11110000) == 0b11100000) expectedLength = 3;
            else if ((buffer[i] & 0b11111000) == 0b11110000) expectedLength = 4;
            else if ((buffer[i] & 0b11111100) == 0b11111000) expectedLength = 5;
            else if ((buffer[i] & 0b11111110) == 0b11111100) expectedLength = 6;

            int length = MIN(expectedLength, [data length] - i);
            NSData * character = [NSData dataWithBytes:&buffer[i] length:(sizeof(Byte) * length)];

            NSString * possibleString = [NSString stringWithUTF8String:[character bytes]];
            if (possibleString) {
                [filteredString appendString:possibleString];
            }
            i = i + expectedLength;
        }
        bufferStr = filteredString;
    }

    return bufferStr;
}

如果您有任何意见，请告诉我。谢谢拉尔姆！

票数 2

Stack Overflow用户

发布于 2022-05-12 11:27:04

我创建了一个NSString类别，其中包含一个-validUTF8String方法，如果UTF8String返回NULL，则剥离无效的代理字符，然后在已清除的字符串上调用UTF8String：

@interface NSString (ValidUTF8String)

- (const char *)validUTF8String;
- (NSString *)stringByStrippingInvalidUnicode;  // warning: very inefficient! should only be called when we are sure that the string contains invalid Unicode, e.g. when -[UTF8String] is NULL

@end

@implementation NSString (ValidUTF8String)

- (const char *)validUTF8String;
{
    const char *result=[self UTF8String];
    if (!result)
    {
        result=[[self stringByStrippingInvalidUnicode] UTF8String];
        if (!result)
            result="";
    }
    return result;
}

#define isHighSurrogate(k)  ((k>=0xD800) && (k<=0xDBFF))
#define isLowSurrogate(k)   ((k>=0xDC00) && (k<=0xDFFF))

- (NSString *)stringByStrippingInvalidUnicode
{
    NSMutableString *fixed=[[self mutableCopy] autorelease];
    for (NSInteger idx=0; idx<[fixed length]; idx++)
    {
        unichar k=[fixed characterAtIndex:idx];
        if (isHighSurrogate(k))
        {
            BOOL nextIsLowSurrogate=NO;
            if (idx+1<[fixed length])
            {
                unichar nextK=[fixed characterAtIndex:idx+1];
                nextIsLowSurrogate=isLowSurrogate(nextK);
            }
            if (!nextIsLowSurrogate)
            {
                [fixed deleteCharactersInRange:NSMakeRange(idx, 1)];
                idx--;
            }
        }
        else if (isLowSurrogate(k))
        {
            BOOL previousWasHighSurrogate=NO;
            if (idx>0)
            {
                unichar previousK=[fixed characterAtIndex:idx-1];
                previousWasHighSurrogate=isHighSurrogate(previousK);
            }
            if (!previousWasHighSurrogate)
            {
                [fixed deleteCharactersInRange:NSMakeRange(idx, 1)];
                idx--;
            }
        }
    }
    return fixed;
}

@end

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/30372870

复制

相似问题

问来自NSInputStream的字符串不是有效的utf8。如何将更多的“有损”转换为utf8
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问来自NSInputStream的字符串不是有效的utf8。如何将更多的“有损”转换为utf8EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问来自NSInputStream的字符串不是有效的utf8。如何将更多的“有损”转换为utf8
EN