文章/答案/技术大牛

发布

社区首页 >问答首页 >LucidWorks: Java正则表达式&GNU正则表达式

问LucidWorks: Java正则表达式&GNU正则表达式
EN

Stack Overflow用户

提问于 2011-11-17 21:31:29

回答 1查看 154关注 0票数 1

我正在尝试创建正则表达式，这样我就可以用LucidWorks在我的网站上爬行和索引某些URL。

示例URL：http://www.example.com/reviews/assassins-creed-revelations/24475/reviews/示例URL：http://www.example.com/reviews/super-mario-3d-land/64303/reviews/

基本上，我希望LucidWorks搜索我的整个网站，并且只索引在URL末尾有/reviews/的URL。

有人能帮我构造一个表达式吗？:)

更新：

网址：http://www.example.com/

包括路径://*/审查/*

这种方式有效，但它只抓取第一页，它不会转到下一页与更多的评论(1,2,3等)。

如果我也添加：///reviews/.*

我得到了很多我不想要的页面，比如http://www.example.com/?page=2

java

regex

回答 1

Stack Overflow用户

发布于 2013-02-14 10:25:09

Check with this function
public boolean canAcceptURL(String url,String endsWith){
    boolean canAccept = false;
    String regex = "";
    try{
        if(endsWith.equals("")){
            endsWith = "/reviews/";
        }
    regex = "[\\x20-\\x7E]*"+endsWith+"$";//Check the url string u passed ends     with the endString you hav passed.If end string is null it will take the default value.
        canAccept = url.matches(regex);
    }catch (PatternSyntaxException pe) {
        pe.printStackTrace();
    }catch (Exception e) {
        e.printStackTrace();
    }
    System.out.println("String matches : "+canAccept);
    return canAccept;
}

Sample out put :
calling function : canAcceptURL("http://www.example.com/reviews/super-mario-3d-land/64303/reviews/","/reviews/");
String matches : true

if you want to get the url contains *'/reviews/'* just change the regex string to

String regex = "[\\x20-\\x7E]*/reviews/[\\x20-\\x7E]*"; // this will accept a string with white space and special character.

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/8174619

复制

相似问题

问LucidWorks: Java正则表达式&GNU正则表达式
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问LucidWorks: Java正则表达式&GNU正则表达式EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问LucidWorks: Java正则表达式&GNU正则表达式
EN