文章/答案/技术大牛

发布

问用正则表达式提取数据
EN

Stack Overflow用户

提问于 2015-08-14 17:11:22

回答 2查看 52关注 0票数 1

我在这里找到了一个很好的解决方案，但是regex将字符串分成"“字符串和我需要的其他两个拆分。

String  Result = "<ahref=https://blabla.com/Securities_regulation_in_the_United_States>Securities regulation in the United States</a> - Securities regulation in the United States is the field of U.S. law that covers transactions and other dealings with securities.";

String [] Arr =  Result.split("<[^>]*>");
for (String elem : Arr) {
    System.out.printf(elem);
}

结果是：

Arr[0]= ""
Arr[1]= Securities regulation in the United States
Arr[2]= Securities regulation in the United States is the field of U.S. law that covers transactions and other dealings with securities.

Arr[1]和Arr[2]的分割很好，我就是不能摆脱Arr[0]。

java

regex

回答 2

Stack Overflow用户

回答已采纳

发布于 2015-08-14 17:16:12

通过使用如下正则表达式，您可以使用相反的regex来捕获所需的内容：

(?s)(?:^|>)(.*?)(?:<|$)

工作演示

IDEOne码工作

代码：

String line = "ahref=https://blabla.com/Securities_regulation_in_the_United_States>Securities regulation in the United States</a> - Securities regulation in the United States is the field of U.S. law that covers transactions and other dealings with securities.";

Pattern pattern = Pattern.compile("(?s)(?:^|>)(.*?)(?:<|$)");
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
    System.out.println("group 1: " + matcher.group(1));
}

票数 2

Stack Overflow用户

发布于 2015-08-14 17:29:49

如果只使用split，则无法避免该空字符串，特别是因为正则表达式不是零长度。

您可以尝试删除放置在输入开始处的第一个匹配项，然后再拆分其他匹配项，如

String[] Arr =  Result.replaceFirst("^<[^>]+>","").split("<[^>]+>")

但是通常情况下，应该是 避免对HTML\XML使用regex。尝试使用解析器代替和珍汤一样。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/32015361

复制

相似问题

问用正则表达式提取数据
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用正则表达式提取数据EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用正则表达式提取数据
EN