文章/答案/技术大牛

发布

社区首页 >问答首页 >不匹配

问不匹配
EN

Stack Overflow用户

提问于 2015-10-28 21:43:34

回答 3查看 202关注 0票数 1

我正在编写一个程序，它将返回\begin{theorem}和\end{theorem}之间以及\begin{proof}和\end{proof}之间的所有文本。

使用regex似乎很自然，但是由于有许多潜在的元字符，因此需要对它们进行转义。

下面是我写的代码：

import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class LatexTheoremProofExtractor {

    // This is the LaTeX source that will be processed
    private String source = null;

    // These are the list of theorems and proofs that are extracted, respectively 
    private ArrayList<String> theorems = null;
    private ArrayList<String> proofs = null;

    // These are the patterns to match theorems and proofs, respectively 
    private static final Pattern THEOREM_REGEX = Pattern.compile("\\begin\\{theorem\\}(.+?)\\end\\{theorem\\}");
    private static final Pattern PROOF_REGEX = Pattern.compile("\\begin\\{proof\\}(.+?)\\end\\{proof\\}");

    LatexTheoremProofExtractor(String source) {
        this.source = source;
    }

    public void parse() {
        extractEntity("theorem");
        extractEntity("proof");
    }

    private void extractTheorems() {
        if(theorems != null) {
            return;
        }

        theorems = new ArrayList<String>();

        final Matcher matcher = THEOREM_REGEX.matcher(source);
        while (matcher.find()) {
            theorems.add(new String(matcher.group(1)));
        }   
    }

    private void extractProofs() {
        if(proofs != null) {
            return;
        }

        proofs = new ArrayList<String>();

        final Matcher matcher = PROOF_REGEX.matcher(source);
        while (matcher.find()) {
            proofs.add(new String(matcher.group(1)));
        }       
    }

    private void extractEntity(final String entity) {   
        if(entity.equals("theorem")) {
            extractTheorems();
        } else if(entity.equals("proof")) {
            extractProofs();
        } else {
            // TODO: Throw an exception or something
        }       
    }

    public ArrayList<String> getTheorems() {
        return theorems;
    }

}

下面是我的考试失败了

@Test 
public void testTheoremExtractor() {
    String source = "\\begin\\{theorem\\} Hello, World! \\end\\{theorem\\}";
    LatexTheoremProofExtractor extractor = new LatexTheoremProofExtractor(source);
    extractor.parse();
    ArrayList<String> theorems = extractor.getTheorems();
    assertEquals(theorems.get(0).trim(), "Hello, World!");
}

很明显，我的测试表明我希望这次测试只有一次匹配，并且应该是“你好，世界！”(后修整)。

目前，theorems是一个空的非null数组.因此，我的Matcher不匹配模式。有人能帮我理解原因吗？

谢谢你，雷普

java

regex

回答 3

Stack Overflow用户

回答已采纳

发布于 2015-10-28 22:23:12

下面是您需要对代码进行的更新--提取器方法中的2个regexes应该更改为

private static final Pattern THEOREM_REGEX = Pattern.compile(Pattern.quote("\\begin\\{theorem\\}") + "(.+?)" + Pattern.quote("\\end\\{theorem\\}"));
private static final Pattern PROOF_REGEX = Pattern.compile(Pattern.quote("\\begin\\{proof\\}") + "(.+?)" + Pattern.quote("\\end\\{proof\\}"));

结果将是“你好，世界！”见IDEONE演示。

您拥有的字符串实际上是\begin\{theorem\} Hello, World! \end\{theorem\}。Java字符串中的文本反斜杠加倍，当需要将Java中的文本反斜杠与regex匹配时，需要使用\\\\。为了避免反斜杠地狱，Pattern.quote可以提供帮助，告诉正则表达式将其中的所有子模式作为文字处理。

有关Pattern.quote的更多详细信息可以在文档中找到。

返回指定String的文字模式String。此方法生成一个String，该Pattern可用于创建与字符串s匹配的Pattern，就好像它是文字模式一样。输入序列中的元字符或转义序列将没有特殊意义。

票数 1

Stack Overflow用户

发布于 2015-10-28 21:52:45

您的第一个regex需要：

Pattern THEOREM_REGEX = Pattern.compile("\\\\begin\\\\\\{theorem\\\\\\}(.+?)\\\\end\\\\\\{theorem\\\\\\}");

当您试图匹配正则表达式中需要\\的反斜杠时。

票数 0

Stack Overflow用户

发布于 2015-10-29 01:18:07

在您的测试代码中似乎有一个其他答案没有解决的错误。您可以创建这样的测试字符串：

String source = "\\begin\\{theorem\\} Hello, World! \\end\\{theorem\\}";

...but在文本中，您说源字符串应该是：

\begin{theorem} Hello, World! \end{theorem}

如果是这样的话，字符串文本应该是：

"\\begin{theorem} Hello, World! \\end{theorem}"

要创建regex，您可以使用：

Pattern.quote("\\begin{theorem}") + "(.*?)" + Pattern.quote("\\end{theorem}")

...or手动转义它：

"\\\\begin\\{theorem\\}(.*?)\\\end\\{theorem\\}"

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/33402127

复制

相似问题

问不匹配
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问不匹配EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问不匹配
EN