如何从NSString中删除html标记,但保留任何<Text in angle brackets>
像<p>123 <Hello> abc</p> -> 123 <Hello> abc
我尝试过所有类型的regexp、scanner和XML Parser解决方案,但它们删除了<Text in angle brackets>和标签。
唯一适合我的解决方案是使用带选项的NSAttributedString
NSAttributedString *str = [[NSAttributedString alloc] initWithData:utf8Data
options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: @(NSUTF8StringEncoding)}
documentAttributes:nil
error:nil];
NSString *result = [str string];但是这种方法使用了WebKit,并且为我的任务消耗了太多的内存。
那么,如何从NSString中剥离标记,而不使用任何类型的WebKit/UIWebView等来保留<Text in angle brackets>呢?
发布于 2015-05-24 21:07:06
我前段时间问了一个similar questionma,可能有一些答案可以帮到你。如果您确实需要完整的超文本标记语言解析器,并且只想剥离超文本标记语言标记,那么NSString类别可能会很有用(这个类别是mwaterfal修改后的类别):
- (NSString *)stringByStrippingTags {
// Find first & and short-cut if we can
NSUInteger ampIndex = [self rangeOfString:@"<" options:NSLiteralSearch].location;
if (ampIndex == NSNotFound) {
return [NSString stringWithString:self]; // return copy of string as no tags found
}
// Scan and find all tags
NSScanner *scanner = [NSScanner scannerWithString:self];
[scanner setCharactersToBeSkipped:nil];
NSMutableSet *tags = [[NSMutableSet alloc] init];
NSString *tag;
do {
// Scan up to <
tag = nil;
[scanner scanUpToString:@"<" intoString:NULL];
[scanner scanUpToString:@">" intoString:&tag];
if (tag) {
NSString *t = [[NSString alloc] initWithFormat:@"%@>", tag];
[tags addObject:t];
}
} while (![scanner isAtEnd]);
NSMutableString *result = [[NSMutableString alloc] initWithString:self];
NSString *finalString;
NSString *replacement;
for (NSString *t in tags) {
replacement = @" ";
if ([t isEqualToString:@"<a>"] ||
[t isEqualToString:@"</a>"] ||
[t isEqualToString:@"<span>"] ||
[t isEqualToString:@"</span>"] ||
[t isEqualToString:@"<strong>"] ||
[t isEqualToString:@"</strong>"] ||
[t isEqualToString:@"<em>"] ||
[t isEqualToString:@"</em>"]) {
replacement = @"";
}
[result replaceOccurrencesOfString:t
withString:replacement
options:NSLiteralSearch
range:NSMakeRange(0, result.length)];
}
// Remove multi-spaces and line breaks
return = [result stringByRemovingNewLinesAndWhitespace];
}https://stackoverflow.com/questions/30379279
复制相似问题