我有一个带有LaTeX数学方程的TXT文件,其中在每个内联方程之前和之后使用一个$分隔符。
我希望在一个段落中找到每个方程,并用XML开始和结束标记替换分隔符.
例如,
以下段落:
This is the beginning of a paragraph $first equation$ ...and here is some text... $second equation$ ...and here is more text... $third equation$ ...and here is yet more text... $fourth equation$应成为:
This is the beginning of a paragraph <equation>first equation</equation> ...and here is some text... <equation>second equation</equation> ...and here is more text... <equation>third equation</equation> ...and here is yet more text... <equation>fourth equation</equation>我已经尝试过sed和perl命令,如下所示:
perl -p -e 's/(\$)(.*[^\$])(\$)/<equation>$2<\/equation>/'但是,这些命令导致第一次和最后一次方程的转换,但这两种方程之间的任何一种方程都没有:
This is the beginning of a paragraph <equation>first equation$ ...and here is some text... $second equation$ ...and here is more text... $third equation$ ...and here is yet more text... $fourth equation</equation>我还想要一个健壮的解决方案,它可以考虑不作为LaTeX分隔符的单个$的存在。例如,
This is the beginning of a paragraph $first equation$ ...and here is some text that includes a single dollar sign: He paid $2.50 for a pack of cigarettes... $second equation$ ...and here is more text... $third equation$ ...and here is yet more text... $fourth equation$不会变成:
This is the beginning of a paragraph <equation>first equation$ ...and here is some text that includes a single dollar sign: He paid <equation>2.50 for a pack of cigarettes... $second equation$ ...and here is more text... $third equation$ ...and here is yet more text... $fourth equation</equation>注:我是用巴什写的。
发布于 2020-12-21 02:38:47
注意:这个答案的第一部分只关注替换对$'s;对于OP对的请求,而不是,替换独立的$'s .见答案的下半部分。
$'s的替换对
样本数据:
$ cat latex.txt
... $first equation$ ... $second equation$ ... $third equation$sed的一个想法是:
sed -E 's|\$([^$]*)\$|<equation>\1</equation>|g' latex.txt其中:
-E -启用扩展regex support\$ -匹配一个文本$([^$]*) -捕获组#1 -匹配所有非文字$ (在本例中,$'s)\$对之间的所有内容-匹配文字$<equation>\1</equation> -用<equation> + contents of capture group + </equation>/g替换匹配的字符串)-根据需要重复搜索/替换这就产生了:
... <equation>first equation</equation> ... <equation>second equation</equation> ... <equation>third equation</equation>处理独立$的
如果可以转义独立的$ (例如,\$),一个想法是让sed用一个无意义的文字替换它,执行<equation> / </equation>替换,然后将无意义的文本更改为\$。
样本数据:
$ cat latex.txt
... $first equation$ ... $second equation$ ... $third equation$
... $first equation$ ... \$3.50 cup of coffee ... $third equation$使用新的替代品的原始sed解决方案:
sed -E 's|\\\$|LITDOL|g;s|\$([^$]*)\$|<equation>\1</equation>|g;s|LITDOL|\\\$|g' latex.txt将\$替换为LITDOL (LITeral DOLlar),执行原始的替换,然后将LITDOL切换回\$。
它产生:
... <equation>first equation</equation> ... <equation>second equation</equation> ... <equation>third equation</equation>
... <equation>first equation</equation> ... \$3.50 cup of coffee ... <equation>third equation</equation>发布于 2020-12-21 03:01:54
使用负前瞻性来尝试这个Perl。
$ cat joseph.txt
This is the beginning of a paragraph $first equation$ ...and here is some text that includes a single dollar sign: He paid $2.50 for a pack of cigarettes... $second equation$ ...and here is more text... $third equation$ ...and here is yet more text... $fourth equation$
$ perl -p -e 's/(\$)(?![\d.]+)(.+?)(\$)/<equation>$2<\/equation>/g' joseph.txt
This is the beginning of a paragraph <equation>first equation</equation> ...and here is some text that includes a single dollar sign: He paid $2.50 for a pack of cigarettes... <equation>second equation</equation> ...and here is more text... <equation>third equation</equation> ...and here is yet more text... <equation>fourth equation</equation>
$https://stackoverflow.com/questions/65386540
复制相似问题