我在这里找到了一个很好的解决方案,但是regex将字符串分成"“字符串和我需要的其他两个拆分。
String Result = "<ahref=https://blabla.com/Securities_regulation_in_the_United_States>Securities regulation in the United States</a> - Securities regulation in the United States is the field of U.S. law that covers transactions and other dealings with securities.";
String [] Arr = Result.split("<[^>]*>");
for (String elem : Arr) {
System.out.printf(elem);
}结果是:
Arr[0]= ""
Arr[1]= Securities regulation in the United States
Arr[2]= Securities regulation in the United States is the field of U.S. law that covers transactions and other dealings with securities.Arr[1]和Arr[2]的分割很好,我就是不能摆脱Arr[0]。
发布于 2015-08-14 17:16:12
通过使用如下正则表达式,您可以使用相反的regex来捕获所需的内容:
(?s)(?:^|>)(.*?)(?:<|$)代码:
String line = "ahref=https://blabla.com/Securities_regulation_in_the_United_States>Securities regulation in the United States</a> - Securities regulation in the United States is the field of U.S. law that covers transactions and other dealings with securities.";
Pattern pattern = Pattern.compile("(?s)(?:^|>)(.*?)(?:<|$)");
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
System.out.println("group 1: " + matcher.group(1));
}发布于 2015-08-14 17:29:49
如果只使用split,则无法避免该空字符串,特别是因为正则表达式不是零长度。
您可以尝试删除放置在输入开始处的第一个匹配项,然后再拆分其他匹配项,如
String[] Arr = Result.replaceFirst("^<[^>]+>","").split("<[^>]+>")但是通常情况下,应该是 避免对HTML\XML使用regex。尝试使用解析器代替和珍汤一样。
https://stackoverflow.com/questions/32015361
复制相似问题