首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在段落中查找匹配字符串

在段落中查找匹配字符串
EN

Stack Overflow用户
提问于 2020-12-21 01:34:33
回答 2查看 109关注 0票数 2

我有一个带有LaTeX数学方程的TXT文件,其中在每个内联方程之前和之后使用一个$分隔符。

我希望在一个段落中找到每个方程,并用XML开始和结束标记替换分隔符.

例如,

以下段落:

代码语言:javascript
复制
This is the beginning of a paragraph $first equation$ ...and here is some text... $second equation$ ...and here is more text... $third equation$ ...and here is yet more text... $fourth equation$

应成为:

代码语言:javascript
复制
This is the beginning of a paragraph <equation>first equation</equation> ...and here is some text... <equation>second equation</equation> ...and here is more text... <equation>third equation</equation> ...and here is yet more text... <equation>fourth equation</equation>

我已经尝试过sed和perl命令,如下所示:

代码语言:javascript
复制
perl -p -e 's/(\$)(.*[^\$])(\$)/<equation>$2<\/equation>/'

但是,这些命令导致第一次和最后一次方程的转换,但这两种方程之间的任何一种方程都没有:

代码语言:javascript
复制
This is the beginning of a paragraph <equation>first equation$ ...and here is some text... $second equation$ ...and here is more text... $third equation$ ...and here is yet more text... $fourth equation</equation>

我还想要一个健壮的解决方案,它可以考虑不作为LaTeX分隔符的单个$的存在。例如,

代码语言:javascript
复制
This is the beginning of a paragraph $first equation$ ...and here is some text that includes a single dollar sign: He paid $2.50 for a pack of cigarettes... $second equation$ ...and here is more text... $third equation$ ...and here is yet more text... $fourth equation$

不会变成:

代码语言:javascript
复制
This is the beginning of a paragraph <equation>first equation$ ...and here is some text that includes a single dollar sign: He paid <equation>2.50 for a pack of cigarettes... $second equation$ ...and here is more text... $third equation$ ...and here is yet more text... $fourth equation</equation>

注:我是用巴什写的。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-12-21 02:38:47

注意:这个答案的第一部分只关注替换对$'s;对于OP对的请求,而不是,替换独立的$'s .见答案的下半部分。

$'s替换对

样本数据:

代码语言:javascript
复制
$ cat latex.txt
... $first equation$ ... $second equation$ ... $third equation$

sed的一个想法是:

代码语言:javascript
复制
sed -E 's|\$([^$]*)\$|<equation>\1</equation>|g' latex.txt

其中:

  • -E -启用扩展regex support
  • \$ -匹配一个文本$
  • ([^$]*) -捕获组#1 -匹配所有非文字$ (在本例中,$'s)
  • \$对之间的所有内容-匹配文字$
  • <equation>\1</equation> -用<equation> + contents of capture group + </equation>
  • /g替换匹配的字符串)-根据需要重复搜索/替换

这就产生了:

代码语言:javascript
复制
... <equation>first equation</equation> ... <equation>second equation</equation> ... <equation>third equation</equation>

处理独立$

如果可以转义独立的$ (例如,\$),一个想法是让sed用一个无意义的文字替换它,执行<equation> / </equation>替换,然后将无意义的文本更改为\$

样本数据:

代码语言:javascript
复制
$ cat latex.txt
... $first equation$ ... $second equation$ ... $third equation$
... $first equation$ ... \$3.50 cup of coffee ... $third equation$

使用新的替代品的原始sed解决方案:

代码语言:javascript
复制
sed -E 's|\\\$|LITDOL|g;s|\$([^$]*)\$|<equation>\1</equation>|g;s|LITDOL|\\\$|g' latex.txt

\$替换为LITDOL (LITeral DOLlar),执行原始的替换,然后将LITDOL切换回\$

它产生:

代码语言:javascript
复制
... <equation>first equation</equation> ... <equation>second equation</equation> ... <equation>third equation</equation>
... <equation>first equation</equation> ... \$3.50 cup of coffee ... <equation>third equation</equation>
票数 5
EN

Stack Overflow用户

发布于 2020-12-21 03:01:54

使用负前瞻性来尝试这个Perl。

代码语言:javascript
复制
$ cat joseph.txt
This is the beginning of a paragraph $first equation$ ...and here is some text that includes a single dollar sign: He paid $2.50 for a pack of cigarettes... $second equation$ ...and here is more text... $third equation$ ...and here is yet more text... $fourth equation$
$ perl -p -e 's/(\$)(?![\d.]+)(.+?)(\$)/<equation>$2<\/equation>/g' joseph.txt
This is the beginning of a paragraph <equation>first equation</equation> ...and here is some text that includes a single dollar sign: He paid $2.50 for a pack of cigarettes... <equation>second equation</equation> ...and here is more text... <equation>third equation</equation> ...and here is yet more text... <equation>fourth equation</equation>
$
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/65386540

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档