文章/答案/技术大牛

发布

社区首页 >问答首页 >如何将前瞻准则分割成2个普通规则？

问如何将前瞻准则分割成2个普通规则？
EN

Stack Overflow用户

提问于 2021-03-30 09:22:57

回答 1查看 104关注 0票数 1

我有一个前瞻性的regex [^a-z0-9%*][a-z0-9%]{3,}(?=[^a-z0-9%*])。在我的测验中，它从@@||imasdk.googleapis.com/js/core/bridge*.html中提取4个子字符串。

|imasdk
.googleapis
.com
/core

我需要重写它与2个良好的老正则表达式，因为我不能使用外观头(不支持regex引擎)。我将其拆分为[^a-z0-9%*][a-z0-9%]{3,}和[^a-z0-9%*]，并在匹配后的子字符串中对每个第一个正则表达式匹配进行检查。

出于某种原因，它提取 /bridge也是.，因为.没有在[^a-z0-9%*]中列出，而是在/bridge之后找到的。那么，展望是如何运作的:它必须是一个完全匹配的，一个substr (find结果)还是其他什么？这是否意味着在这种情况下，每个结尾字符都不是来自a-z0-9%*的集合？

在Rust中，代码如下：

    lazy_static! {
        // WARNING: the original regex is `"[^a-z0-9%*][a-z0-9%]{3,}(?=[^a-z0-9%*])"` but Rust's regex
        // does not support look-around, so we have to check it programmatically for the last match
        static ref REGEX: Regex = Regex::new(r###"[^a-z0-9%*][a-z0-9%]{3,}"###).unwrap();
        static ref LOOKAHEAD_REGEX: Regex = Regex::new(r###"[^a-z0-9%*]"###).unwrap();
    }

    let pattern_lowercase = pattern.to_lowercase();
    
    let results = REGEX.find_iter(&pattern_lowercase);
    for (is_last, each_candidate) in results.identify_last() {
        let mut candidate = each_candidate.as_str();
        if !is_last {
            // have to simulate positive-ahead check programmatically
            let ending = &pattern_lowercase[each_candidate.end()..]; // substr after the match
            println!("searching in {:?}", ending);
            let lookahead_match = LOOKAHEAD_REGEX.find(ending);
            if lookahead_match.is_none() {
                // did not find anything => look-ahead is NOT positive
                println!("NO look-ahead match!");
                break;
            } else {
                println!("found look-ahead match: {:?}", lookahead_match.unwrap().as_str());
            }
        }
         ...

测试输出：

"|imasdk":
searching in ".googleapis.com/js/core/bridge*.html"
found look-ahead match: "."
".googleapis":
searching in ".com/js/core/bridge*.html"
found look-ahead match: "."
".com":
searching in "/js/core/bridge*.html"
found look-ahead match: "/"
"/core":
searching in "/bridge*.html"
found look-ahead match: "/"
"/bridge":
searching in "*.html"
found look-ahead match: "."

^在这里您可以看到/bridge是由于跟随.和而发现的，这是不正确的。

positive-lookahead

regex-look-ahead

regex

rust

regex-lookarounds

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-03-30 14:28:12

您的LOOKAHEAD_REGEX在匹配后的任何位置查找不在任何位置范围内的字符，但是具有前瞻性的原始正则表达式只会在匹配后立即查看单个字符。这就是为什么您的代码找不到/bridge和regex101 :您的代码在匹配后的某个地方看到.，而regex101只查看*。

您可以通过锚定LOOKAHEAD_REGEX来修复您的代码，以便它只查看第一个字符：^[^a-z0-9%*]。

另外，正如@Sven所建议的那样，您可以使用匹配完整表达式：[^a-z0-9%*][a-z0-9%]{3,}[^a-z0-9%*]的单个正则表达式，并删除匹配的最后一个字符。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/66868125

复制

相似问题

问如何将前瞻准则分割成2个普通规则？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何将前瞻准则分割成2个普通规则？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何将前瞻准则分割成2个普通规则？
EN