我有一个输入String,如:
String text = "Some content which contains link as <A HREF=\"/relative-path/fruit.cgi?param1=abc&param2=xyz\">URL Label</A> and some text after it";我想将此文本转换为:
Some content which contains link as http://www.google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&myParam=pqr (URL Label) and some text after it所以在这里:
1)我想用普通链接替换链接标签。如果标签包含标签,那么它应该在URL之后进入大括号。
2)如果URL是相对的,我想在基本URL (http://www.google.com)前加上前缀。
3)我想在URL中附加一个参数。(&myParam=pqr)
我有问题,检索标签的URL和标签,并更换它。
我写了这样的东西:
public static void main(String[] args) {
String text = "String text = "Some content which contains link as <A HREF=\"/relative-path/fruit.cgi?param1=abc&param2=xyz\">URL Label</A> and some text after it";";
text = text.replaceAll("<", "<");
text = text.replaceAll(">", ">");
text = text.replaceAll("&", "&");
// this is not working
Pattern p = Pattern.compile("href=\"(.*?)\"");
Matcher m = p.matcher(text);
String url = null;
if (m.find()) {
url = m.group(1);
}
}
// helper method to append new query params once I have the url
public static URI appendQueryParams(String uriToUpdate, String queryParamsToAppend) throws URISyntaxException {
URI oldUri = new URI(uriToUpdate);
String newQueryParams = oldUri.getQuery();
if (newQueryParams == null) {
newQueryParams = queryParamsToAppend;
} else {
newQueryParams += "&" + queryParamsToAppend;
}
URI newUri = new URI(oldUri.getScheme(), oldUri.getAuthority(),
oldUri.getPath(), newQueryParams, oldUri.getFragment());
return newUri;
}Edit1:
Pattern p = Pattern.compile("HREF=\"(.*?)\"");这个很管用。但我希望它是资本化不可知论者。Href、HRef、href、hrEF等都应该工作。
另外,如果我的文本有几个URL,我该如何处理。
Edit2:
一些进展。
Pattern p = Pattern.compile("href=\"(.*?)\"");
Matcher m = p.matcher(text);
String url = null;
while (m.find()) {
url = m.group(1);
System.out.println(url);
}这将处理多个URL的情况。
最后一个悬而未决的问题是,我如何获得标签,并将原始文本中的href标记替换为URL和label。
Edit3:
在多个URL情况下,我的意思是在给定的文本中存在多个url。
String text = "Some content which contains link as <A HREF=\"/relative-path/fruit.cgi?param1=abc&param2=xyz\">URL Label</A> and some text after it and another link <A HREF=\"/relative-path/vegetables.cgi?param1=abc&param2=xyz\">URL2 Label</A> and some more text";
Pattern p = Pattern.compile("href=\"(.*?)\"", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
String url = null;
while (m.find()) {
url = m.group(1); // this variable should contain the link URL
url = appendBaseURI(url);
url = appendQueryParams(url, "license=ABCXYZ");
System.out.println(url);
}发布于 2018-11-22 05:48:00
public static void main(String args[]) {
String text = "Some content which contains link as <A HREF=\"/relative-path/fruit.cgi?param1=abc&param2=xyz\">URL Label</A> and some text after it and another link <A HREF=\"/relative-path/vegetables.cgi?param1=abc&param2=xyz\">URL2 Label</A> and some more text";
text = StringEscapeUtils.unescapeHtml4(text);
Pattern p = Pattern.compile("<a href=\"(.*?)\">(.*?)</a>", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
while (m.find()) {
text = text.replace(m.group(0), cleanUrlPart(m.group(1), m.group(2)));
}
System.out.println(text);
}
private static String cleanUrlPart(String url, String label) {
if (!url.startsWith("http") && !url.startsWith("www")) {
if (url.startsWith("/")) {
url = "http://www.google.com" + url;
} else {
url = "http://www.google.com/" + url;
}
}
url = appendQueryParams(url, "myParam=pqr").toString();
if (label != null && !label.isEmpty()) url += " (" + label + ")";
return url;
}输出
Some content which contains link as http://www.google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&myParam=pqr (URL Label) and some text after it and another link http://www.google.com/relative-path/vegetables.cgi?param1=abc¶m2=xyz&myParam=pqr (URL2 Label) and some more text发布于 2018-11-22 04:04:23
您可以使用apache共用文本 StringEscapeUtils对html实体进行解码,然后使用replaceAll,即:
import org.apache.commons.text.StringEscapeUtils;
String text = "Some content which contains link as <A HREF=\"/relative-path/fruit.cgi?param1=abc&param2=xyz\">URL Label</A> and some text after it";
String output = StringEscapeUtils.unescapeHtml4(text).replaceAll("([^<]+).+\"(.*?)\">(.*?)<[^>]+>(.*)", "$1https://google.com$2&your_param ($3)$4");
System.out.print(output);
// Some content which contains link as https://google.com/relative-path/fruit.cgi?param1=abc¶m2=xyz&your_param (URL Label) and some text after it演示:
发布于 2018-11-22 02:45:19
//这不管用
因为你的判断力是区分大小写的。
试着:-
Pattern p = Pattern.compile("href=\"(.*?)\"", Pattern.CASE_INSENSITIVE);Edit1
要获得标签,请使用Pattern.compile("(?<=>).*?(?=</a>)", Pattern.CASE_INSENSITIVE)和m.group(0)。
Edit2
若要用最终字符串替换标记(包括标签),请使用:-
text.replaceAll("(?i)<a href=\"(.*?)</a>", "new substring here")https://stackoverflow.com/questions/53423132
复制相似问题